Content:
Presentation type:
ESSI – Earth & Space Science Informatics

EGU24-2658 | ECS | Orals | MAL17-ESSI | ESSI Division Outstanding Early Career Scientist Award Lecture

X-informatics at the center of scientific discovery: Detecting biosignatures, predicting mineral occurrences, and characterizing planetary kinds.  

Anirudh Prabhu, Shaunna Morrison, Robert Hazen, Michael L. Wong, Grethe Hystad, Henderson J. Cleaves II, Ahmed Eleish, George Cody, Vasundhara Gatne, Jose P Chavez, Xiaogang Ma, and Peter Fox and the Mineral Informatics Team

Data Science and Informatics methods have been at the center of many recent scientific discoveries and have opened up new frontiers in many areas of scientific inquiry. In this talk, I will take you through some of the most recent and exciting discoveries we've made and how informatics methods planned a central role in these discoveries. 

First, we will look at our work on data-driven biosignature detection, specifically how we combine pyrolysis-gas chromatography-mass spectrometry and machine learning to build an agnostic molecular biosignature detection model. 

Next, we will talk about how we used association analysis to predict the locations of as-yet-unknown mineral deposits on Earth and potentially Mars. These advances hold the potential to unlock new avenues of economic growth and sustainable development.

Finally, we will set our sights on exoplanets—celestial bodies orbiting distant stars. The discovery of thousands of exoplanets in recent years has fueled the quest to understand their formation, composition, and potential habitability. We develop informatics approaches to better understand, classify and predict the occurrence of exoplanets by embracing the complexity and multidimensionality of exoplanets and their host stars.

How to cite: Prabhu, A., Morrison, S., Hazen, R., Wong, M. L., Hystad, G., Cleaves II, H. J., Eleish, A., Cody, G., Gatne, V., Chavez, J. P., Ma, X., and Fox, P. and the Mineral Informatics Team: X-informatics at the center of scientific discovery: Detecting biosignatures, predicting mineral occurrences, and characterizing planetary kinds. , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2658, https://doi.org/10.5194/egusphere-egu24-2658, 2024.

EGU24-4562 | Orals | MAL17-ESSI | Ian McHarg Medal Lecture

The central role of geoscience data standards in generating new knowledge 

Francois Robida

The earth sciences are first and foremost observational sciences, based on data collected on the planet over generations. It is from this data that interpretations, concepts and models are produced. The description of data by those who produced it and its conservation has always been a concern for scientists, to enable it to be reused and reproduced. This has been achieved by adopting common rules and standards, for example for indicating the geographical coordinates of an observation or the units of measurement used.

Today's scientific challenges, first and foremost the climate challenge, require the mobilisation of different scientific disciplines, often with different languages and practices.

The establishment of data infrastructures on an international scale means that researchers can use computer protocols to access considerable sources of data from their own and other disciplines. Digital tools such as AI make it possible to make machines 'reason' about data to produce new knowledge.

All these factors make it critical for both humans and machines to be able to 'understand' the data used. This understanding necessarily requires the adoption of common reference systems on an international scale and across all disciplines. These standards are based on a common 'vision' produced by the scientific community (and updated as knowledge evolves), resulting in vocabularies and ontologies shared by the community.

This presentation will look at the ecosystem for producing and maintaining standards for the geosciences and some of the issues involved in the relationship between scientists and the production and use of standards.

How to cite: Robida, F.: The central role of geoscience data standards in generating new knowledge, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4562, https://doi.org/10.5194/egusphere-egu24-4562, 2024.

ESSI1 – Next-Generation Analytics for Scientific Discovery: Data Science, Machine Learning, AI

EGU24-1651 | ECS | Orals | ESSI1.1

AtmoRep: large scale representation learning for atmospheric dynamics 

Ilaria Luise, Christian Lessig, Martin Schultz, and Michael Langguth

The atmosphere affects humans in a multitude of ways, from loss of lives due to adverse weather effects to long-term social and economic impacts. Very recently, AI-based models have shown tremendous potential in reducing the computational costs for numerical weather prediction. However, they lack the versatility of conventional models. The team has therefore recently introduced AtmoRep, a first probabilistic foundation model of atmospheric dynamics for multi-purpose applications [Lessig 2023].  Through large-scale representation learning, AtmoRep encapsulates a general description of the atmosphere dynamics based on the ERA5 reanalysis. Following the principles of in-context learning from natural language processing, adapted here to Earth system science, domain applications like e.g. forecasting and downscaling can be performed without any task-specific training. The model has therefore been applied as the backbone for several tasks, from weather forecasting to downscaling, spatio-temporal interpolations and data driven precipitation forecasting. After fine-tuning AtmoRep achieves skill competitive with Pangu-Weather [Bi 2023] for short-term forecasting and substantially exceeds the AI-based competitor [Stengel 2021] for downscaling. 

 

The model has been conceived as a flexible stack of Transformers, one for each field, coupled through cross attention to ensure a plug-and-play architecture and allow the dynamical integration of new fields without the need of retraining from scratch. The main innovation consists of a newly developed statistical loss, which generalises from the concept of cross-entropy in classification problems. The model is therefore fully probabilistic, and each application comes with a well calibrated set of ensemble members with spread correlated to the variability of the system, as demonstrated for e.g. in forecasting by inspecting the CRPS score or the error to spread ratios (see [Lessig 2023]). 

 

In addition, the flexible nature of the model allows to perform model fine-tuning on different data-types. To demonstrate that, the precipitation forecasting skill of AtmoRep has been fine-tuned on real radar data using the Radklim dataset as a proxy for accurate total precipitation rates. Using Radklim as ground truth, the diagnostic scores e.g. the RMSE or the FBI (Frequency Bias Indicator), indicate univocally that after fine-tuning the AtmoRep model outperforms ERA5, both in terms of accuracy in spatial coverage and intensity. 

 

In terms of future plans, we are currently working to extend the model to longer term weather forecasts, up to medium range forecasting. Furthermore, we are integrating the downscaling and forecasting steps using the CERRA 5km resolution reanalysis over Europe, so to achieve multi-resolution coarse-to-fine predictions beyond quarter degree resolution in the next few months. 

AtmoRep represents a step forward in the direction of building solid and skilful multi-purpose approaches and the present work is, in our opinion, only a first step towards the possibilities that are enabled by the methodology.

 

[Lessig 2023] Lessig et. al. AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv:2308.13280, 2023.

[Bi 2023] K. Bi et al., “Accurate medium-range global weather forecasting with 3d neural networks,” Nature, 2023.

[Stengel 2021] K. Stengel et al., “Adversarial super-resolution of climatological wind and solar data,” Proceed- ings of the National Academy of Sciences, vol. 117, 2020.

How to cite: Luise, I., Lessig, C., Schultz, M., and Langguth, M.: AtmoRep: large scale representation learning for atmospheric dynamics, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1651, https://doi.org/10.5194/egusphere-egu24-1651, 2024.

EGU24-1760 | ECS | Orals | ESSI1.1

EarthPT: a foundation model for Earth Observation 

Michael Smith, Luke Fleming, and James Geach

 We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'

EarthPT is available under the MIT licence.

How to cite: Smith, M., Fleming, L., and Geach, J.: EarthPT: a foundation model for Earth Observation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1760, https://doi.org/10.5194/egusphere-egu24-1760, 2024.

EGU24-3202 | Orals | ESSI1.1 | Highlight

Foundation Models for Science: Potential, Challenges, and the Path Forward 

Manil Maskey, Rahul Ramachandran, Tsengdar Lee, Kevin Murphy, Sujit Roy, Muthukumaran Ramasubramanian, Iksha Gurung, and Raghu Ganti

Foundation models signify a significant shift in AI by creating large-scale machine learning models (FMs) pre-trained on wide-ranging datasets. These models act as flexible starting points, ready to be fine-tuned for various specialized tasks. Distinct from traditional models designed for narrow objectives, foundation models apply their broad pre-training to learn patterns across data, enhancing their adaptability and efficiency in diverse domains. This approach minimizes the necessity for extensive, task-specific labeled datasets and prolonged training periods. A single foundation model can be tailored for many scientific applications, often outperforming traditional models in some tasks, even when labeled data is scarce.

 

Addressing the right array of complex scientific challenges using AI FMs requires interdisciplinary teams from various groups and organizations. No single research group or institution can independently muster the necessary resources or expertise to construct useful AI FMs. Thus, collaborative efforts are essential, combining diverse skills, resources, and viewpoints to create more comprehensive solutions. The right blend of domain-specific expertise and a broad understanding of various AI subfields is crucial to ensure the versatility and adaptability of foundation models. Moreover, the scientific community must develop a wide array of use cases, labeled datasets, and benchmarks to evaluate these models effectively across different scenarios to be accepted and widely utilized within science.

 

Building Foundation Models for science demands fostering collaboration among a diverse spectrum of research groups to ensure this broad range of perspectives. This strategy should include stakeholders like individual researchers, academic and government institutions, and tech companies. Embedding this collaboration within the principles of open science is therefore vital. Open science calls for transparent research, open sharing of findings, promoting reproducibility by making methodologies and data accessible, and providing tools researchers can freely use, modify, and distribute. Encouraging community collaboration in the model pre-training development leads to more robust and functional FM. Guaranteeing open access to datasets, models, and fine-tuning code enables researchers to validate findings and build upon previous work, thus reducing redundancy in data collection and cultivating a culture of shared knowledge and progress.

How to cite: Maskey, M., Ramachandran, R., Lee, T., Murphy, K., Roy, S., Ramasubramanian, M., Gurung, I., and Ganti, R.: Foundation Models for Science: Potential, Challenges, and the Path Forward, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3202, https://doi.org/10.5194/egusphere-egu24-3202, 2024.

EGU24-5769 | Orals | ESSI1.1

Exploring Transfer Learning Using Segment Anything Model in Optical Remote Sensing 

Mohanad Albughdadi, Vasileios Baousis, Tolga Kaprol, Armagan Karatosun, and Claudio Pisa

In the realm of remote sensing, where labeled datasets are scarce, leveraging pre-trained models via transfer learning offers a compelling solution. This study investigates the efficacy of the Segment Anything Model (SAM), a foundational computer vision model, in the domain of optical remote sensing tasks, specifically focusing on image classification and semantic segmentation.

The scarcity of labeled data in remote sensing poses a significant challenge for machine learning development. Transfer learning, a technique utilizing pre-trained models like SAM, circumvents this challenge by leveraging existing data from related domains. SAM, developed and trained by Meta AI, serves as a foundational model for prompt-based image segmentation. It employs over 1 billion masks on 11 million images, facilitating robust zero-shot and few-shot capabilities. SAM's architecture comprises an image encoder, prompt encoder, and mask decoder components, all geared towards swift and accurate segmentation for various prompts, ensuring real-time interactivity and handling ambiguity.

Two distinct use cases leveraging SAM-based models in the domain of optical remote sensing are presented, representing two critical tasks: image classification and semantic segmentation. Through comprehensive analysis and comparative assessments, various model architectures, including linear and convolutional classifiers, SAM-based adaptations, and UNet for semantic segmentation, are examined. Experiments encompass contrasting model performances across different dataset splits and varying training data sizes. The SAM-based models include using a linear, a convolutional or a ViT decoder classifiers on top of the SAM encoder.

Use Case 1: Image Classification with EuroSAT Dataset

The EuroSAT dataset, comprising 27,000 labeled image patches from Sentinel-2 satellite images across ten distinct land cover classes, serves as the testing ground for image classification tasks. SAM-ViT models consistently demonstrate high accuracy, ranging between 89% and 93% on various sizes of training datasets. These models outperform baseline approaches, exhibiting resilience even with limited training data. This use case highlights SAM-ViT's effectiveness in accurately categorizing land cover classes despite data limitations.

Use Case 2: Semantic Segmentation with Road Dataset

In the semantic segmentation domain, the study focuses on the Road dataset, evaluating SAM-based models, particularly SAM-CONV, against the benchmark UNet model. SAM-CONV showcases remarkable superiority, achieving F1-scores and Dice coefficients exceeding 0.84 and 0.82, respectively. Its exceptional performance in pixel-level labeling emphasizes its robustness in delineating roads from surrounding environments, surpassing established benchmarks and demonstrating its applicability in fine-grained analysis.

In conclusion, SAM-driven transfer learning methods hold promise for robust remote sensing analysis. SAM-ViT excels in image classification, while SAM-CONV demonstrates superiority in semantic segmentation, paving the way for their practical use in real-world remote sensing applications despite limited labeled data availability.

How to cite: Albughdadi, M., Baousis, V., Kaprol, T., Karatosun, A., and Pisa, C.: Exploring Transfer Learning Using Segment Anything Model in Optical Remote Sensing, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5769, https://doi.org/10.5194/egusphere-egu24-5769, 2024.

Remote sensing image scene classification is to annotate semantic categories for image areas covering multiple land cover types, reflecting the spatial aggregation of relevant social resources among feature objects, which is one of the remote sensing interpretation tasks with higher challenges for algorithms to understand the images. Nowadays, scene semantic information extraction of images using deep neural networks is also one of the hot research directions. In comparison to other algorithms, deep neural networks can better capture semantic information in images to achieve higher classification accuracy involved in applications such as urban planning. In recent years, multi-modal models represented by image-text have achieved satisfactory performance in downstream tasks. The introduction of "multi-modal" in the field of remote sensing research should not be limited to the use of multi-source data, but more importantly to the coding of diverse data and the extracted deep features based on the huge amount of data. Therefore, in this paper, based on an image-text matching model, we establish a multi-modal scene classification model (Fig. 1) for high spatial resolution aerial images which is dominated by image features and text provides facilitation for the representation of image features. The algorithm first employs self-supervised learning of the visual model, to align the expression domain of the image features obtained from training on natural images with that of our particular dataset, which will help to improve the feature extraction effectiveness of the aerial survey images on the visual model. The features generated by the pre-trained image encoding model and the text encoding model will be further aligned and some of the parameters in the image encoder will be iteratively updated during training. A valid classifier is designed at the end of the model to implement the scene classification task. Through experiments, it was found that our algorithm has a significant improvement effect on the task of scene categorization on aerial survey images compared to single visual models. The model presented in the article obtained precision and recall of above 90% on the test dataset, contained in the high spatial resolution aerial survey images dataset we built with 27 categories (Fig. 2).

Fig 1. Diagram of the proposed model structure. Blue boxes are associated with the image, green boxes with the text, and red boxes with both image and text.

Fig 2. Samples in our high spatial resolution aerial survey images dataset.

How to cite: He, L., Lin, Y., and Song, Y.: A multi-modal high spatial resolution aerial imagery scene classification model with visual enhancement, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6107, https://doi.org/10.5194/egusphere-egu24-6107, 2024.

EGU24-10914 | Posters virtual | ESSI1.1

Efficient adaptation of Foundation Models for Visual Grounding Remote Sensing task 

Ali J. Ghandour, Hasan Moughnieh, Mohammad Hasan Zahweh, Hasan Nasrallah, Mustafa Shukor, Cristiano Nattero, and Paolo Campanella

Foundation models have demonstrated impressive proficiency across multiple domains, including language, vision, and multi-modal applications, establishing new standards for efficiency and adaptability. In the context of localization-based foundational models, the core strength of such models is their ability to precisely recognize and locate objects across a diverse set of objects in wide-area scenes. This precision is particularly vital in the Remote Sensing (RS) field. The multimodality aspect of these models becomes pivotal in RS, as they can process and interpret complex data, allowing for more comprehensive aerial and satellite image analysis.

Multimodality has emerged as a crucial and dynamic area in recent AI developments, finding diverse applications such as image captioning and visual question answering. More related to traditional visual tasks, Visual Grounding (VG) stands out, involving the localization of objects based on textual descriptions. Unlike conventional approaches that train models on predefined and fixed lists of objects, VG allows a model to locate any entity in an image based on diverse textual descriptions, enabling open-vocabulary predictions. Despite notable efforts in developing powerful VG models to solve general benchmarks, there is a need for more exploration into transferring these models to the remote sensing context.

This paper addresses this gap by delving into the task of visual grounding for remote sensing. Our initial exploration reveals that utilizing general pretrained foundational models for RS yields suboptimal performance. After recognizing these limitations, our work systematically investigates various parameter-efficient tuning techniques to fine-tune these models for RS visual grounding applications. The insights and methodologies presented in this paper provide valuable guidance for researchers seeking to adapt pretrained models to the RS domain efficiently. This adaptation marks a substantial advancement in the field, offering a significant stride toward enhancing the applicability of visual grounding in remote sensing scenarios.

How to cite: Ghandour, A. J., Moughnieh, H., Zahweh, M. H., Nasrallah, H., Shukor, M., Nattero, C., and Campanella, P.: Efficient adaptation of Foundation Models for Visual Grounding Remote Sensing task, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10914, https://doi.org/10.5194/egusphere-egu24-10914, 2024.

EGU24-11145 | ECS | Posters on site | ESSI1.1

Preliminary analysis of the potentialities of the Segment Anything Model (SAM) in the segmentation of Sentinel-2 imagery for water reservoir monitoring 

Filippo Bocchino, Germana Sergi, Roberta Ravanelli, and Mattia Crespi

Water reservoirs play a crucial role in the supply of freshwater, agricultural irrigation, hydroelectric power generation, and various industrial applications. However, their existence is increasingly threatened by water stress, due to growing water demand, water pollution, and impacts of climate change, including intensified and prolonged droughts. To address this challenge, a sustainable management of water resources is essential, relying on continuous and accurate monitoring of water reservoirs. Modern Earth Observation technologies offer an effective, frequent, and cost-efficient means for monitoring water basins. 

This study focuses on evaluating the potential of the Segment Anything Model (SAM) network (Kirillov et al., 2023), released by Meta AI in April 2023, for segmenting water reservoirs through the processing of satellite images. SAM aims to serve as a foundational segmentation model capable of generalising its segmentation abilities in a zero-shot manner across diverse tasks. Unlike traditionally supervised learning, zero-shot learning enables a model to recognize objects or features it has never seen during the training. Notably, SAM’s application to satellite imagery, a type of images for which it was not specifically trained, poses a unique challenge. 

In this work, SAM was applied to Sentinel-2 multispectral imagery using a "prompt click" approach, where a water-class pixel was pre-selected for each input image. Google Earth Engine facilitated temporal aggregation of Sentinel-2 images on the interest period (from 01/01/2019 to 31/12/2019), creating four RGB median images, one for each three-month period. SAM was independently applied to investigate  each of these four sub-periods. 

Validation was carried out in the Genoa port area to minimise the influence of temporal water level variations, which in turn produce water area changes. Indeed, the use of a port area made it possible to consider a single reference mask for the different sub-periods analysed, greatly simplifying the validation procedure. 

The validation phase revealed SAM’s superior performance in coastlines with regular shapes and undisturbed water (Fig. 1 and Tab. 1), while port areas, characterised by irregular shapes, higher activity and turbidity, yielded less satisfactory results (Fig. 2 and Tab. 2). 

In conclusion, this study highlighted SAM’s limitations, primarily related to the specific nature of satellite images, vastly different from the training data. Limitations include SAM’s training on three-band (R,G,B) and 8-bit images: the first one has led to the impossibility of using all the 13 bands of Sentinel-2 multispectral images and the second one caused the need to reduce the radiometric resolution of the Sentinel-2 images (from 16 bit to 8 bit), both resulting in information loss. Despite these limitations, SAM demonstrated effective segmentation capabilities, especially in simpler and less disturbed coastal areas, comparable to water segmentation algorithms based on spectral indices. Future improvements could be achieved through fine-tuning on satellite images, and applying SAM to  high-resolution ones.

Fig.1

Tab.1

Fig.2

Tab.2

 

References:

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y. et al.,2023. Segment anything. arXiv preprint arXiv:2304.02643.

How to cite: Bocchino, F., Sergi, G., Ravanelli, R., and Crespi, M.: Preliminary analysis of the potentialities of the Segment Anything Model (SAM) in the segmentation of Sentinel-2 imagery for water reservoir monitoring, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11145, https://doi.org/10.5194/egusphere-egu24-11145, 2024.

EGU24-11514 | ECS | Orals | ESSI1.1

Towards Foundation Models for Earth Observation; Benchmarking Datasets and Performance on Diverse Downstream Tasks 

Anna Jungbluth, Matt Allen, Francisco Dorr, Joseph Gallego-Mejia, Laura Martínez-Ferrer, Freddie Kalaitzis, and Raúl Ramos-Pollán

Satellite-based Earth Observation (EO) is crucial for monitoring land changes and natural hazards on a global scale. In addition to optical imagery, synthetic aperture radar (SAR) technology has proven indispensable, since radar pulses can penetrate clouds and detect millimeter changes on the ground surface. While SAR polarimetry data is easily available (e.g. via Google Earth Engine), interferometric products are harder to obtain due to complex pre-processing requirements. 

In general, using the information contained in EO data (both optical and SAR) for specific downstream tasks often requires specialized analysis pipelines that are not easily accessible to the scientific community. In the context of applying machine learning to EO, self-supervised learning (SSL) - machine learning models that learn features in data without being provided with explicit labels - offer great potential to fully leverage the wealth and complexity of the available data.

In this work, we apply self-supervised learning techniques to create pre-trained models that can leverage the features learned from unlabelled EO data for a variety of downstream tasks. More specifically, we pre-train our models on optical imagery (Sentinel-2) or SAR data (Sentinel-1), and fine-tune our models to predict local events (e.g. fires, floods) and annual land characteristics (e.g. vegetation percentage, land cover, biomass). We compare a number of state-of-the-art SSL techniques (MAE1, DINO2, VICReg3, CLIP4) that have shown great performance on standard image or text based tasks. By adapting these models to our use case, we demonstrate the potential of SSL for EO, and show that self-supervised pre-training strongly reduces the requirement for labels.

In addition to the pre-trained models, we provide global benchmarking datasets of EO input data and associated downstream tasks ready for use in machine learning pipelines. Our data contains 25+ TB of co-registered and aligned tiles, covering South America, the US, Europe, and Asia. By comparing how well our pre-trained models perform on unseen data (both regionally and temporally), we investigate the generalizability of SSL techniques for EO research. With this, our work provides a first step towards creating EO foundation models that can predict anything, anywhere on Earth.

 

1. He, K. et al. Masked Autoencoders Are Scalable Vision Learners. (2021).

2. Caron, M. et al. Emerging Properties in Self-Supervised Vision Transformers. (2021).

3. Bardes, A., Ponce, J. & LeCun, Y. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. (2021).

4. Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. (2021).

How to cite: Jungbluth, A., Allen, M., Dorr, F., Gallego-Mejia, J., Martínez-Ferrer, L., Kalaitzis, F., and Ramos-Pollán, R.: Towards Foundation Models for Earth Observation; Benchmarking Datasets and Performance on Diverse Downstream Tasks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11514, https://doi.org/10.5194/egusphere-egu24-11514, 2024.

EGU24-17934 | Orals | ESSI1.1

The PhilEO Geospatial Foundation Model Suite 

Bertrand Le Saux, Casper Fibaek, Luke Camilleri, Andreas Luyts, Nikolaos Dionelis, Giacomo Donato Cascarano, Leonardo Bagaglini, and Giorgio Pasquali

Foundation Models (FMs) are the latest big advancement in AI that build upon Deep Learning. They have the ability to analyse large volumes of unlabeled Earth Observation (EO) data by learning at scale, identifying complex patterns and trends that may be difficult or even impossible to detect through traditional methods. These models can then be used as a base to create powerful applications that automatically identify, classify, and analyse features in EO data, unlocking the full potential of AI in EO like never before, providing a paradigm shift in the field.

The field of geospatial FMs is blooming with milestones such as Seasonal Contrast (SeCo) [1] or Prithvi [2]. We present the PhilEO Suite: a dataset (the PhilEO Globe), a series of models (the PhilEO Pillars), and an evaluation testbed (the PhilEO Bench).

In particular, the PhilEO Bench [3] is a novel framework to evaluate the performances of the numerous EO FM propositions on a unified set of downstream tasks. Indeed, there is the need now to assess them with respect to their expected qualities in terms of generalisation, universality, label efficiency, and easiness to derive specialised models. The PhilEO Bench comprises a fair testbed bringing independence to external factors and a novel 400GB global, stratified Sentinel-2 dataset containing labels for the three downstream tasks of building density estimation, road segmentation, and land cover classification.

 

References

[1] Oscar Manas, et al., “Seasonal Contrast: Unsupervised pre-training from uncurated remote sensing data,” in Proc. ICCV, 2021.

[2] Johannes Jakubik, Sujit Roy, et al., “Foundation Models for Generalist Geospatial Artificial Intelligence,” arxiv:2310.18660, 2023.

[3] Casper Fibaek, Luke Camilleri, Andreas Luyts, Nikolaos Dionelis, and Bertrand Le Saux, “PhilEO Bench: Evaluating Geo-Spatial Foundation Models,” arXiv:2401.04464, 2024.

How to cite: Le Saux, B., Fibaek, C., Camilleri, L., Luyts, A., Dionelis, N., Cascarano, G. D., Bagaglini, L., and Pasquali, G.: The PhilEO Geospatial Foundation Model Suite, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17934, https://doi.org/10.5194/egusphere-egu24-17934, 2024.

EGU24-18331 | Posters on site | ESSI1.1

Downscaling with the foundation model AtmoRep 

Michael Langguth, Christian Lessig, Martin Schultz, and Ilaria Luise

In recent years, deep neural networks (DNN) to enhance the resolution of meteorological data, known as statistical downscaling, have surpassed classical statistical methods that have been developed previously with respect to several validation metrics. The prevailing approach for DNN downscaling is to train deep learning models in an end-to-end manner. However, foundation models trained on very large datasets in a self-supervised way have proven to provide new SOTA results for various applications in natural language processing and computer vision. 


To investigate the benefit of foundation models in Earth Science applications, we deploy the large-scale representation model for atmospheric dynamics AtmoRep (Lessig et al., 2023) for statistical downscaling of the 2m temperature over Central Europe. AtmoRep has been trained on almost 40 years of ERA5 data from 1979 to 2017 and has shown promising skill in several intrinsic and downstream applications. By extending AtmoRep’s encoder-decoder with a tail network for downscaling, we super-resolve the coarse-grained 2 m temperature field from ERA5-data (Δx = 25 km) to attain the high spatial resolution (Δx = 6 km) of the COSMO REA6 dataset. Different coupling approaches between the core and tail network (e.g. with and without fine-tuning the core model) are tested and analyzed in terms of accuracy and computational efficiency. Preliminary results show that downscaling with a task-specific extension of the foundation model AtmoRep can improve the downscaled product in terms of standard evaluation metrics such as the RMSE compared to a task-specific deep learning model. However, deficiencies in the spatial variability of the downscaled product are also revealed, highlighting the need for future work to focus especially on target data that inhibit a high degree of spatial variability and intrinsic uncertainty such as precipitation.

How to cite: Langguth, M., Lessig, C., Schultz, M., and Luise, I.: Downscaling with the foundation model AtmoRep, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18331, https://doi.org/10.5194/egusphere-egu24-18331, 2024.

EGU24-18852 | ECS | Orals | ESSI1.1

DeepFeatures: Remote sensing beyond spectral indices 

Martin Reinhardt, Karin Mora, Gunnar Brandt, Tejas Morbagal Harish, David Montero, Chaonan Ji, Teja Kattenborn, Francesco Martinuzzi, Clemens Mosig, and Miguel D. Mahecha

Terrestrial surface processes exhibit distinctive spectral signatures captured by optical satellites. Despite the development of over two hundred spectral indices (SIs), current studies often narrow their focus to individual SIs, overlooking the broader context of land surface processes. This project seeks to understand the holistic features of Sentinel-2 based SIs and their relationships with human impact and overall land surface dynamics. To address this, we propose an AI-driven approach that synthesises SIs derived from Sentinel data through dimension reduction, yielding interpretable latent variables describing the system comprehensively. Our goals are to (i) reduce the number of SIs and (ii) compute a few latent variables representing spatio-temporal dynamics, which culminate in a Feature Data Cube. This fully descriptive cube reduces computational costs, facilitating diverse applications. We plan to demonstrate its efficacy in land cover classification, standing deadwood detection, and terrestrial gross primary production estimation. The presentation outlines the project's implementation strategy, confronts methodological challenges, and extends an invitation to the remote sensing and machine learning community to collaborate on pressing environmental challenges. The project DeepFeatures is funded by ESA’s AI4Science activity. Website: https://rsc4earth.de/project/deepfeatures/ 

How to cite: Reinhardt, M., Mora, K., Brandt, G., Morbagal Harish, T., Montero, D., Ji, C., Kattenborn, T., Martinuzzi, F., Mosig, C., and Mahecha, M. D.: DeepFeatures: Remote sensing beyond spectral indices, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18852, https://doi.org/10.5194/egusphere-egu24-18852, 2024.

EGU24-21146 | Posters on site | ESSI1.1

Segment Anything Model (SAM) for Automatic Crater Detection 

Iraklis Giannakis, Anshuman Bhardwaj, Lydia Sam, and Georgis Leontidis

Impact craters, resulting from the collision of meteorites, asteroids, or comets with planetary surfaces, manifest as circular-elliptical depressions with diverse sizes and shapes influenced by various factors. These morphological features play a crucial role in planetary exploration, offering insights into the geological composition and structure of celestial bodies. Beyond their scientific importance, craters may also hold valuable natural resources, such as frozen water in the Moon's permanently shadowed craters. Furthermore, understanding craters’ spatial distribution is pivotal for terrain-relative navigation and for selecting future landing sites.

Manual crater mapping through visual inspection is an impractical and laborious process, often unattainable for large-scale investigations. Moreover, manual crater mapping is susceptible to human errors and biases, leading to potential disagreements of up to 40%. In order to tackle these issues, semi-automatic crater detection algorithms (CDA) have been developed to mitigate human biases, and to enable large-scale and real-time crater detection and mapping.

The majority of CDAs’ are based on machine learning (ML) and data-driven methods. ML-based CDAs’ are trained in a supervised manner using specific datasets that were manually labelled. Because of that, existing ML-based CDAs’ are constrained to specific data types according to the type of their training data. This makes current ML-based CDAs’ unstable and un-practical, since applying an ML scheme to a different type of data requires acquiring and labelling a new training set, and subsequently use it to train a new ML scheme, or fine-tune an already existing one.

In this study, we describe a universal approach [1] for crater identification based on Segment Anything Model (SAM), a foundational computer vision and image segmentation model developed by META [2]. SAM was trained with over 1 billion masks, and is capable to segment various data types (e.g., photos, DEM, spectra, gravity) from different celestial bodies (e.g., Moon, Mars) and measurement setups. The segmentation output undergoes further classification into crater and no-crater based on geometric indices assessing circular and elliptical attributes of the investigated mask. The proposed framework is proven effective across different datasets from various planetary bodies and measurement configurations. The outcomes of this study underlines the potential of foundational segmentation models in planetary science. Foundational models tuned for planetary data can provide universal classifiers contributing towards an automatic scheme for identifying, detecting and mapping various morphological and geological targets in different celestial bodies.

 

References

[1] Giannakis, I., Bhardwaj, A., Sam, L., Leontidis, G., (2024). A Flexible Deep Learning Crater Detection Scheme Using Segment Anything Model (SAM), Icarus, 2024.

[2] Kirillov, A., et al. (2024). Segment Anything, arXiv:2304.02643

How to cite: Giannakis, I., Bhardwaj, A., Sam, L., and Leontidis, G.: Segment Anything Model (SAM) for Automatic Crater Detection, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21146, https://doi.org/10.5194/egusphere-egu24-21146, 2024.

EGU24-22461 | Posters on site | ESSI1.1

Pretraining a foundation model using MODIS observations of the earth’s atmosphere 

Valentine Anantharaj, Takuya Kurihana, Gabriele Padovani, Ankur Kumar, Aristeidis Tsaris, Udaysankar Nair, Sandro Fiore, and Ian Foster

Pretraining a foundation model using MODIS observations of the earth’s atmosphere 

The earth and atmospheric sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). Smaller and cheaper satellites with reduced operational costs have made a variety of EO data affordable, and technological advances have made the data accessible to a wide range of stakeholders, especially the scientific community (EY, 2023). The NASA ESDS program alone is expected to host 320 PB of data by 2030 (NASA ESDS, 2023). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, accessibility to extensive compute resources and the maturity of deep learning architectures, especially the transformer (Bommasani et al., 2021). 

Developing a foundation model involves pretraining a suitable deep learning architecture with large amounts of data, often via self supervised learning (SSL) methods. The pretrained models can then be adapted to downstream tasks via fine tuning, requiring less amount of data than task-specific models. Large language models (LLM) are likely the most common type of foundation encountered by the general public. Vision transformers (ViT) are based on the LLM architecture and adapted for image and image-like data (Dosovitskiy, et. al., 2020), such as EO data and ESM simulation output.  We are in the process of pretraining a ViT model for the earth’s atmosphere using a select few bands of 1-km Level-1B MODIS radiances and brightness temperatures, MOD021KM and MYD021KM from the NASA Terra and Aqua satellites respectively. We are using 200 million image chips of size 128x128 pixels. We are pretraining two ViT models of sizes 100 million and 400 million parameters respectively. The pretrained models will be finetuned for cloud classification and evaluated against AICCA. We will discuss our experiences involving data and computing, and present preliminary results.

 

References

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, et al: On the opportunities and risks of foundation models. CoRR abs/2108.07258. https://arxiv.org/abs/2108.07258, 2021. 

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Ernst & Young (EY): How can the vantage of space give you strategic advantage on Earth? https://www.ey.com/en_gl/technology/how-can-the-vantage-of-space-give-you-strategic-advantage-on-earth, 2023. Accessed 10 January 2024.

Kurihana, Takuya, Elisabeth J. Moyer, and Ian T. Foster: AICCA: AI-Driven Cloud Classification Atlas. Remote Sensing 14, no. 22: 5690. https://doi.org/10.3390/rs14225690, 2022.

NASA MODIS: MODIS - Level 1B Calibrated Radiances. DOI: 10.5067/MODIS/MOD021KM.061 and DOI: 10.5067/MODIS/MYD021KM.061

NASA ESDS: Earthdata Cloud Evolution https://www.earthdata.nasa.gov/eosdis/cloud-evolution. Accessed 10 January 2024.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I: Attention is all you need. Adv Neural Inf Process Syst 30, 2017.





How to cite: Anantharaj, V., Kurihana, T., Padovani, G., Kumar, A., Tsaris, A., Nair, U., Fiore, S., and Foster, I.: Pretraining a foundation model using MODIS observations of the earth’s atmosphere, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22461, https://doi.org/10.5194/egusphere-egu24-22461, 2024.

EGU24-1198 | ECS | PICO | ESSI1.3

Satellite-Driven Traffic Volume Estimation: Harnessing Hybrid Machine Learning for Sustainable Urban Planning and Pollution Control 

Bilal Aslam, Toby Hocking, Pawlok Dass, Anna Kato, and Kevin Gurney

As cities grow and more cars occupying the roads, greenhouse gas emissions and air pollution in urban areas are going up. To better understand the emissions and pollutions, and help effective urban environmental mitigation, an accurate estimation of traffic volume is crucial. This study delves into the application of Hybrid Machine Learning models to estimate and predict traffic volume by utilizing satellite data and other datasets in both the USA and Europe. The research investigates the predictive capabilities of machine learning models employing freely accessible global datasets, including Sentinel 2, Night-time light data, population, and road density. Neural Network, nearest neighbours, random forest and XGBoost regression models were employed for traffic volume prediction, and their accuracy was enhanced using a hyperparameter-tuned K-Fold Cross-validation technique. Model accuracy, evaluated through Mean Percentage Error (MPE%) and R-square, revealed that XGBoost Regression model yielding an R2 accuracy of 0.81 and MPE of 13%. The low error (and therefore high accuracy) as well as the model's versatility allows its application worldwide for traffic volume computation utilizing readily available datasets. Machine learning models, particularly the XGBoost Regression model, prove valuable for on-road traffic volume prediction, offering a dataset applicable to town planning, urban transportation, and combating urban air pollution.

How to cite: Aslam, B., Hocking, T., Dass, P., Kato, A., and Gurney, K.: Satellite-Driven Traffic Volume Estimation: Harnessing Hybrid Machine Learning for Sustainable Urban Planning and Pollution Control, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1198, https://doi.org/10.5194/egusphere-egu24-1198, 2024.

Long-term satellite-based imagery provides fundamental data support for identifying and analyzing land surface dynamics. Although moderate-spatial-resolution data, like the Moderate Resolution Imaging Spectroradiometer (MODIS), were widely used for large-scale regional studies, their limited availability before 2000 restricts their usage in long-term investigations. To reconstruct retrospective MODIS-like data, this study proposes a novel deep learning-based model, named the Land-Cover-assisted SpatioTemporal Fusion model (LCSTF). LCSTF leverages medium-grained spatial class features from Landcover300m and temporal seasonal fluctuations from the Global Inventory Modelling and Mapping Studies (GIMMS) NDVI3g time series data to generate 500-meter MODIS-like data from 1992 to 2010 over the continental United States. The model also implements the Long Short-Term Memory (LSTM) sensor-bias correction method to mitigate systematic differences between sensors. Validation against actual MODIS images confirms the model’s ability to produce accurate MODIS-like data. Additionally, when assessed with Landsat data prior to 2000, the model demonstrates excellent performance in reconstructing retrospective data. The developed model and the reconstructed biweekly MODIS-like dataset offer significant potential for extending the temporal coverage of moderate-spatial-resolution data, enabling comprehensive long-term and large-scale studies of land surface dynamics.

How to cite: Zhang, Z., Xiong, Z., Pan, X., and Xin, Q.: Developing a land-cover-assisted spatiotemporal fusion model for producing pre-2000 MODIS-like data over the continental United States, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2021, https://doi.org/10.5194/egusphere-egu24-2021, 2024.

EGU24-4445 | PICO | ESSI1.3

Fully Differentiable Physics-informed Lagrangian Convolutional Neural Network for Precipitation Nowcasting 

Peter Pavlík, Martin Výboh, Anna Bou Ezzeddine, and Viera Rozinajová

The task of precipitation nowcasting is often perceived as a computer vision problem. It is analogous to next frame video prediction - i.e. processing consecutive radar precipitation map frames and predicting the future ones. This makes convolutional neural networks (CNNs) a great fit for this task. In the recent years, the CNNs have become the de-facto state-of-the-art model for precipitation nowcasts.

However, a pure machine learning model has difficulties to capture accurately the underlying patterns in the data. Since the data behaves according to the known physical laws, we can incorporate this knowledge to train more accurate and trustworthy models.

We present a double U-Net model, combining a continuity-constrained Lagrangian persistence U-Net with an advection-free U-Net dedicated to capturing the precipitation growth and decay. In contrast to previous works, the combined model is fully differentiable, allowing us to fine-tune these models together in a data-driven way. We examine the learned Lagrangian mappings, along with a thorough quantitative and qualitative evaluation. The results of the evaluation will be provided in the presentation.

How to cite: Pavlík, P., Výboh, M., Bou Ezzeddine, A., and Rozinajová, V.: Fully Differentiable Physics-informed Lagrangian Convolutional Neural Network for Precipitation Nowcasting, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4445, https://doi.org/10.5194/egusphere-egu24-4445, 2024.

EGU24-10278 | ECS | PICO | ESSI1.3

Assessing the area of applicability of spatial prediction models through a local data point density approach 

Fabian Schumacher, Christian Knoth, Marvin Ludwig, and Hanna Meyer

Machine learning is frequently used in the field of earth and environmental sciences to produce spatial or spatio-temporal predictions of environmental variables based on limited field samples - increasingly even on a global scale and far beyond the extent of available training data. Since new geographic space often goes along with new environmental properties, the spatial applicability and transferability of models is often questionable. Predictions should be constrained to environments that exhibit properties the model has been enabled to learn.

Meyer and Pebesma (2021) have made a first proposal to estimate the area of applicability (AOA) of spatial prediction models. Their method is based on distances - in the predictor space - of the prediction data point to the nearest reference data point to derive a dissimilarity Index (DI). Prediction locations with a DI larger than DI values observed through cross-validation during model training are considered outside of the AOA. As a consequence, the AOA is defined as the area where the model has been enabled to learn about relationships between predictors and target variables and where, on average, the cross-validation performance applies. The method, however, is only based on the distance - in the predictor space - to the nearest reference data point. Hence, a single data point in an environment may define a model as “applicable” in this environment. Here we suggest extending this approach by considering the densitiy of reference data points in the predictor space, as we assume that this is highly decisive for the prediction quality.

We suggest extending the methodology with a newly developed local data point density (LPD) approach based on the given concepts of the original method to allow for a better assessment of the applicability of a model. The LPD is a quantitative measure for a new data point that indicates how many similar (in terms of predictor values) reference data points have been included in the model training, assuming a positive relationship between LPD values and prediction performance. A reference data point is considered similar if it defines a new data point as being within the AOA, i.e. the model is considered applicable for the corresponding prediction location. We implemented the LPD approach in the R package CAST. Here we explain the method and show its applicability in simulation studies as well as real-world applications.

Reference:

Meyer, H; Pebesma, E. 2021. ‘Predicting into unknown space? Estimating the area of applicability of spatial prediction models.’ Methods in Ecology and Evolution 12: 1620–1633. doi: 10.1111/2041-210X.13650.

How to cite: Schumacher, F., Knoth, C., Ludwig, M., and Meyer, H.: Assessing the area of applicability of spatial prediction models through a local data point density approach, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10278, https://doi.org/10.5194/egusphere-egu24-10278, 2024.

With the rapid growth in global trade, the demand for efficient route planning and resource utilization in logistics and transportation mirrors the Travelling Salesman Problem (TSP). TSP refers to finding the shortest route possible of N destinations by visiting each destination once and returning to the starting point. Moreover, the computational complexity of TSP increases exponentially with the number of destinations, where finding an exact solution is not practical in larger instance. It has long been a challenging optimization problem, prompting the development of various methodologies to seek for more efficient solution, especially towards metaheuristics in recent research. Therefore, this research proposes an optimization algorithm with the implementation of the Swarm Intelligence-based method for solving TSP, providing an approximate solution. The proposed algorithm is evaluated by comparing its performance in terms of solution quality and computation time to well-known optimization methods, namely the Genetic Algorithm and the Ant Colony Optimization. 47 cities and 50 landmarks in the U.S. are selected as the destinations for two experimental datasets respectively with geospatial data retrieved from Google Maps Platform API. The experiment result suggests that the proposed algorithm has computed a near-optimal solution along with the shortest computation time among the three optimization methods. Solving the TSP efficiently contributes significantly to route planning for transportation and logistics. By shortening the travelling time, optimizing resource utilization, and minimizing fuel and energy consumption, this research further aligns with the global goal of carbon reduction for transportation and logistics systems.

How to cite: Wong, K. T.: Solving the Travelling Salesman Problem for Efficient Route Planning through Swarm Intelligence-Based Optimization, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13452, https://doi.org/10.5194/egusphere-egu24-13452, 2024.

Land surface temperature (LST) is a critical parameter for understanding the physical properties of the boundary between the earth's surface and the atmosphere, and it has a significant impact on various research areas, including agriculture, climate, hydrology, and the environment. However, the thermal infrared band of remote sensing is often hindered by clouds and aerosols, resulting in gaps in LST data products, which hinders the practical application of these products. Therefore, reconstruction of cloud-covered thermal infrared LST is vital for the measurement of physical properties in land surface at regional and global scales. In this paper, a novel reconstruction method for Moderate Resolution Imaging Spectroradiometer (MODIS) LST data with a 1-km spatial resolution is proposed by a spatiotemporal consistency constraint network (STCCN) model fusing reanalysis and thermal infrared data. Firstly, a new spatio-temporal consistency loss function was developed to minimize the discrepancies between the reconstructed LST and the actual LST, by using a non-local reinforced convolutional neural network. Secondly, ERA5 surface net solar radiation (SSR) data was applied as one of the important factors for network inputs, it can characterize the influence of the Sun on surface warming and correct the LST reconstruction results. The experimental results show that (1) the STCCN model can precisely reconstruct cloud-covered LST, the coefficient of determination (R) is 0.8973 and the mean absolute error (MAE) is 0.8070 K; (2) with the introduction of ERA5 SSR data, the MAE of reconstructed LST decreases by 17.15% while the R is kept close, indicating that it is necessary and beneficial to consider the effects of radiation data on LST; (3) the analysis of spatial and temporal adaptability indicates that the proposed method exhibits strong resilience and flexibility in accommodating variations across different spatial and temporal scales, suggesting its potential for effective and reliable application in different scenarios; (4) referring to the SURFRAD station observations, the reconstructed R ranges from 0.8 to 0.9, and MAE ranges from 1 to 3 K, demonstrating the high effectiveness and validity of the proposed model for reconstructing regional cloud-covered LST.

How to cite: Gong, Y., Li, H., and Li, J.: STCCN: A spatiotemporal consistency constraint network for all-weather MODIS LST reconstruction by fusing reanalysis and thermal infrared data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13833, https://doi.org/10.5194/egusphere-egu24-13833, 2024.

EGU24-16841 | PICO | ESSI1.3

Photovoltaic Farms Mapping using openEO Platform 

Mohammad Alasawedah, Michele Claus, Alexander Jacob, Patrick Griffiths, Jeroen Dries, and Stefaan Lippens

Photovoltaic farms (PV farms) mapping is essential for establishing valid policies regarding natural resources management and clean energy. As evidenced by the recent COP28 summit, where almost 120 global leaders pledged to triple the world’s renewable energy capacity before 2030, it is crucial to make these mapping efforts scalable and reproducible. Recently, there were efforts towards the global mapping of PV farms [1], but these were limited to fixed time periods of the analyzed satellite imagery and not openly reproducible.  Building on this effort, we propose the use of openEO [2] User Defined Processes (UDP) implemented in openEO platform for mapping solar farms using Sentinel-2 imagery, emphasizing the four foundational FAIR data principles: Findability, Accessibility, Interoperability, and Reusability. The UDPs encapsulate the entire workflow including solar farms mapping, starting from data preprocessing and analysis to model training and prediction. The use of openEO UDPs enables easy reuse and parametrization for future PV farms mapping.  

Open-source data is used to construct the training dataset, leveraging OpenStreetMap (OSM) to gather PV farms polygons across different countries. Different filtering techniques are involved in the creation of the training set, in particular land cover and terrain. To ensure model robustness, we leveraged the temporal resolution of Sentinel-2 L2A data and utilized openEO to create a reusable workflow that simplifies the data access in the cloud, allowing the collection of training samples over Europe efficiently. This workflow includes preprocessing steps such as cloud masking, gap filling, outliers filtering as well as feature extraction. Alot of effort is put in the best training samples generation, ensuring an optimal starting point for the subsequent steps. After compiling the training dataset, we conducted a statistical discrimination analysis of different pixel-level models to determine the most effective one. Our goal is to compare time-series machine learning (ML) models like InceptionTime, which uses 3D data as input, with tree-based models like Random Forest (RF), which employs 2D data along with feature engineering. An openEO process graph is then constructed to organize and automate the execution of the inference phase, encapsulating all necessary processes from the preprocessing to the prediction stage. Finally, the process graph is transformed into a reusable UDP that can be reused by others for replicable PV farms mapping, from single farm to country scale. The use of the openEO UDP enables replications of the workflow to map new temporal assessments of PV farms distribution. The UDP process for the PV farms mapping is integrated with the ESA Green Transition Information Factory (GTIF, https://gtif.esa.int/), providing the ability for streamlined and FAIR compliant updates of related energy infrastructure mapping efforts. 

[1] Kruitwagen, L., et al. A global inventory of photovoltaic solar energy generating units. Nature 598, 604–610 (2021). https://doi.org/10.1038/s41586-021-03957-7 

[2] Schramm, M, et al. The openEO API–Harmonising the Use of Earth Observation Cloud Services Using Virtual Data Cube Functionalities. Remote Sens. 2021, 13, 1125. https://doi.org/10.3390/rs13061125 

How to cite: Alasawedah, M., Claus, M., Jacob, A., Griffiths, P., Dries, J., and Lippens, S.: Photovoltaic Farms Mapping using openEO Platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16841, https://doi.org/10.5194/egusphere-egu24-16841, 2024.

EGU24-17458 | PICO | ESSI1.3

Spatially explicit active learning for crop-type mapping from satellite image time series 

Mariana Belgiu, Beatrice Kaijage, and Wietske Bijker

The availability of sufficient annotated samples is one of the main challenges of the supervised methods used to classify crop types from remote sensing images. Generating a large number of annotated samples is a time-consuming and expensive task. Active Learning (AL) is one of the solutions that can be used to optimize the sample annotation, resulting in an efficiently trained supervised method with less effort. Unfortunately, most of the developed AL methods do not account for the spatial information inherent in remote-sensing images. We propose a novel spatially-explicit AL that uses a semi-variogram to identify and discard the spatially adjacent and, consequently, redundant samples. It was evaluated using Random Forest (RF) and Sentinel-2 Satellite Image Time Series (SITS) in two study areas from the Netherlands and Belgium. In the Netherlands, the spatially explicit AL selected a total number of 97 samples as being relevant for the classification task which led to an overall accuracy of 80%, while the traditional AL method selected a total number of 169 samples achieving an accuracy of 82%. In Belgium, spatially explicit AL selected 223 samples and obtained an overall accuracy of 60%, compared to the traditional AL that selected 327 samples which yielded an accuracy of 63%. We concluded that the developed AL method helped RF achieve a good performance mostly for the classes consisting of individual crops with a relatively distinctive growth pattern such as sugar beets or cereals. Aggregated classes such as ‘fruits and nuts’ represented, however, a challenge. The proposed AL method reveals that accounting for spatial information is an efficient solution to map target crops since it facilitates high accuracy with a low number of samples and, consequently, lower computational resources and time and financial resources for annotation.

How to cite: Belgiu, M., Kaijage, B., and Bijker, W.: Spatially explicit active learning for crop-type mapping from satellite image time series, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17458, https://doi.org/10.5194/egusphere-egu24-17458, 2024.

EGU24-19275 | PICO | ESSI1.3

Artificial Intelligence Reconstructs Historical Climate Extremes 

Étienne Plésiat, Robert Dunn, Markus Donat, Thomas Ludwig, and Christopher Kadow

The year 2023 represents a significant milestone in climate history: it was indeed confirmed by the Copernicus Climate Change Service (C3S) as the warmest calendar year in global temperature data records since 1850. With a deviation of 1.48ºC from the 1850-1900 pre-industrial level, 2023 largely surpasses 2016, 2019, 2020, previously identified as the warmest years on record. As expected, this sustained warmth leads to an increase in frequency and intensity of Extreme Events (EE) with dramatic environmental and societal consequences.

To assess the evolution of these EE and establish adaptation and mitigation strategies, it is crucial to evaluate the trends of extreme indices (EI). However, the observational climate data that are commonly used for the calculation of these indices frequently contains missing values, resulting in partial and inaccurate EI. As we delve deeper into the past, this issue becomes more pronounced due to the scarcity of historical measurements.

To circumvent the lack of information, we are using a deep learning technique based on a U-Net made of partial convolutional layers [1]. Models are trained with Earth system model data from CMIP6 and has the capability to reconstruct large and irregular regions of missing data using minimal computational resources. This approach has shown its ability to outperform traditional statistical methods such as Kriging by learning intricate patterns in climate data [2].

In this study, we have applied our technique to the reconstruction of gridded land surface EI from an intermediate product of the HadEX3 dataset [3]. This intermediate product is obtained by combining station measurements without interpolation, resulting in numerous missing values that varies in both space and time. These missing values affect significantly the calculation of the long-term linear trend (1901-2018), especially if we consider solely the grid boxes containing values for the whole time period. The trend calculated for the TX90p index that measures the monthly (or annual) frequency of warm days (defined as a percentage of days where daily maximum temperature is above the 90th percentile) is presented for the European continent on the left panel of the figure. It illustrates the resulting amount of missing values indicated by the gray pixels. With our AI method, we have been able to reconstruct the TX90p values for all the time steps and calculate the long-term trend shown on the right panel of the figure. The reconstructed dataset is being prepared for the community in the framework of the H2020 CLINT project [4] for further detection and attribution studies.

[1] Liu G. et al., Lecture Notes in Computer Science, 11215, 19-35 (2018)
[2] Kadow C. et al., Nat. Geosci., 13, 408-413 (2020)
[3] Dunn R. J. H. et al., J. Geophys. Res. Atmos., 125, 1 (2020)
[4] https://climateintelligence.eu/

How to cite: Plésiat, É., Dunn, R., Donat, M., Ludwig, T., and Kadow, C.: Artificial Intelligence Reconstructs Historical Climate Extremes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19275, https://doi.org/10.5194/egusphere-egu24-19275, 2024.

EGU24-19394 | PICO | ESSI1.3

Comparing the Role of Spatially and Temporally capable Deep Learning Architectures in Rainfall Estimation: A Case Study over North East India 

Aditya Handur-Kulkarni, Shanay Mehta, Ayush Ghatalia, and Ritu Anilkumar

The northeastern states of India are faced with heavy-precipitation related disasters such as floods and landslides every monsoon. Further, the region's economy is predominantly dependent on agriculture. Thus, accurate prediction of rainfall plays a vital role in the planning and disaster management programs in the region. Existing methods used for rainfall prediction include Automatic Weather Stations that provide real-time rainfall measurements at specific locations. However, these are point-based estimates. For distributed measurements, a satellite-based estimation can be used. While these methods provide vital information on the spatial distribution of precipitation, they face the caveat that they provide only real-time estimates. Numerical weather forecast models are used for encoding forecasting capabilities by simulating the atmosphere's physical processes through data assimilation of observational data from various sources, including weather stations and satellites. However, these models are incredibly complex and require immense computational strength. The veracity of the numerical models is limited by available computing architecture. Recently, a host of data-driven models, including random forest regression, support vector machine regression and deep learning architectures, have been used to provide distributed rainfall forecasts. However, the relative performance of such models in an orographically complex terrain has not been ascertained via a disciplined study. Through this study, we aim to systematically assess the role of convolutional and recurrent neural network architectures in estimating rainfall. We have used rainfall data from the ERA5 Land reanalysis dataset and data from the following additional meteorological variables that can impact rainfall: dew point temperature, skin temperature, amount of solar radiation, wind components, surface pressure and total precipitation. The data aggregated on a daily scale and spanning three decades was selected for this study. We have used the following architectures of neural network algorithms: U-Net architecture modified for regression representing convolutional neural networks and Long Short-Term Memory (LSTM) architecture representing the recurrent neural networks. Various settings of each architecture, such as the number of layers, optimizers and initialization, are validated to assess their performance on rainfall estimation. The developed rainfall estimation models were validated and evaluated using rigorous statistical metrics, such as root mean square error (RMSE) and coefficient of determination (R-squared). The results of this research are expected to provide valuable insights for local governments, farmers, and other stakeholders in the northeastern states of India. Moreover, the study's methodology can be extended to other regions facing similar climate challenges, thus contributing to advancements in the field of rainfall estimation and climate modelling.

How to cite: Handur-Kulkarni, A., Mehta, S., Ghatalia, A., and Anilkumar, R.: Comparing the Role of Spatially and Temporally capable Deep Learning Architectures in Rainfall Estimation: A Case Study over North East India, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19394, https://doi.org/10.5194/egusphere-egu24-19394, 2024.

EGU24-19531 | PICO | ESSI1.3

Gradient-Based Optimisers Versus Genetic Algorithms in Deep Learning Architectures: A Case Study on Rainfall Estimation Over Complex Terrain 

Yash Bhisikar, Nirmal Govindaraj, Venkatavihan Devaki, and Ritu Anilkumar

Gradient-Based Optimisers Versus Genetic Algorithms in Deep Learning Architectures:

A Case Study on Rainfall Estimation Over Complex Terrain

 

Yash Bhisikar1*, Nirmal Govindaraj1*, Venkatavihan Devaki2*, Ritu Anilkumar3

1Birla Institute of Technology And Science, Pilani, K K Birla Goa Campus 

2Birla Institute of Technology And Science, Pilani, Pilani Campus 

3North Eastern Space Applications Centre, Department of Space, Umiam

E-mail: f20210483@goa.bits-pilani.ac.in

* Authors have contributed equally to this study.

Rainfall is a crucial factor that affects planning processes at various scales, ranging from agricultural activities at the village or residence level to governmental initiatives in the domains of water resource management, disaster preparedness, and infrastructural planning. Thus, a reliable estimate of rainfall and a systematic assessment of variations in rainfall patterns is the need of the hour. Recently, several studies have attempted to predict rainfall over various locations using deep learning architectures, including but not limited to artificial neural networks, convolutional neural networks, recurrent neural networks, or a combination of these. However, a major challenge in the estimation of rainfall is the chaotic nature of rainfall, especially the interplay of spatio-temporal components over orographically complex terrain. For complex computer vision challenges, studies have suggested that population search-driven optimisation techniques such as genetic algorithms may be used in the optimisation as an alternative to traditional gradient-based techniques such as Adam, Adadelta and SGD. Through this study, we aim to extend this hypothesis to the case of rainfall estimation. We integrate the use of population search-based techniques, namely genetic algorithms, to optimise a convolutional neural network architecture built using PyTorch. We have chosen the study area of North-East India for this study as it receives significant monsoon rainfall and is impacted by the undulating terrain that adds complexity to the rainfall estimation. We have used 30 years of rainfall data from the ERA5 Land daily reanalysis dataset with a spatial resolution of 11,132 m for the months of June, July, August and September. Additionally, datasets of the following meteorological variables that can impact rainfall were utilised as input features: dew point temperature, skin temperature, net incoming short-wave radiation received at the surface, wind components and surface pressure. All the datasets are aggregated to daily time steps. Several configurations of the U-Net architecture, such as the number of hidden layers, initialisation techniques and optimisation algorithms, have been used to identify the best configuration in the estimation of rainfall for North-East India. Genetic algorithms were used in initialisation and optimisation to assess the ability of population search heuristics using the PyGAD library. The developed rainfall prediction models were validated at different time steps (0-day, 1-day, 2-day and 3-day latency) on a 7:1:2 train, validation, test dataset split for evaluation metrics such as root mean square error (RMSE) and coefficient of determination (R-squared). The evaluation was performed on a pixel-by-pixel basis as well as an image-by-image basis in order to take magnitude and spatial correlations into consideration. Our study emphasises the importance of considering alternate optimising functions and hyperparameter tuning approaches for complex earth observation challenges such as rainfall prediction.

How to cite: Bhisikar, Y., Govindaraj, N., Devaki, V., and Anilkumar, R.: Gradient-Based Optimisers Versus Genetic Algorithms in Deep Learning Architectures: A Case Study on Rainfall Estimation Over Complex Terrain, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19531, https://doi.org/10.5194/egusphere-egu24-19531, 2024.

EGU24-20025 | ECS | PICO | ESSI1.3

Vineyard detection from multitemporal Sentinel-2 images with a Transformer model 

Weiying Zhao, Alexey Unagaev, and Natalia Efremova

This study introduces an innovative method for vineyard detection by integrating advanced machine learning techniques with high-resolution satellite imagery, particularly focusing on the use of preprocessed multitemporal Sentinel-2 images combined with a Transformer-based model.

We collected a series of Sentinel-2 images over an entire seasonal cycle from eight distinct locations in Oregon, United States, all within similar climatic zones. The training and validation database sizes are 403612 and 100903, respectively. To reduce the cloud effect, we used the monthly median band values derived from initially cloud-filtered images.  The multispectral (12 bands) and multiscale (10m, 20m, and 60m) time series were effective in capturing both the phenological patterns of the land covers and the overall management activities.

The Transformer model, primarily recognized for its successes in natural language processing tasks, was adapted for our time series identification scenario. Then, we transferred the object detection into a binary classification task. Our findings demonstrate that the Transformer model significantly surpasses traditional 1D convolutional neural networks (CNNs) in detecting vineyards across 16 new areas within similar climatic zones, boasting an impressive accuracy of 87.77% and an F1 score of 0.876. In the majority of these new test locations, the accuracy exceeded 92%, except for two areas that experienced significant cloud interference and presented numerous missing values in their time series data. This model proved its capability to differentiate between land covers with similar characteristics during various stages of growth throughout the season. Compared with attention LSTM and BiLSTM, it has less trainable parameters when getting a similar performance. The model was especially adept at handling temporal variations, elucidating the dynamic changes in vineyard phenology over time. This research underscores the potential of combining advanced machine learning techniques with high-resolution satellite imagery for crop type detection and suggests broader applications in land cover classification tasks. Future research will pay more attention to the missing value problem.

How to cite: Zhao, W., Unagaev, A., and Efremova, N.: Vineyard detection from multitemporal Sentinel-2 images with a Transformer model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20025, https://doi.org/10.5194/egusphere-egu24-20025, 2024.

EGU24-22153 | PICO | ESSI1.3

Spatial cross-validation of wheat yield estimations using remote sensing and machine learning 

Keltoum Khechba, Mariana Belgiu, Ahmed Laamrani, Qi Dong, Alfred Stein, and Abdelghani Chehbouni

Integration of Machine Learning (ML) with remote sensing data has been successfully used to create detailed agricultural yield maps at both local and global scales. Despite this advancement, a critical issue often overlooked is the presence of spatial autocorrelation in geospatial data used for training and validating ML models. Usually random cross-validation (CV) methods are employed that fail to account for this aspect. This study aimed to assess wheat yield estimations using both random and spatial CV. In contrast to random CV where the data is split randomly, spatial CV involves splitting the data based on spatial locations, to ensure that spatially close data points are grouped together, either entirely in the training or in the test set, but not both. Conducted in Northern Morocco during the 2020-2021 agricultural season, our research uses Sentinel 1 and Sentinel 2 satellite images as input variables as well as 1329 field data locations to estimate wheat yield. Three ML models were employed: Random Forest, XGBoost, and Multiple Linear Regression. Spatial CV was employed across varying spatial scales. The province represents predefined administrative division, while grid2 and grid1 are equally sized spatial blocks, with a spatial resolution of 20x20km and 10x10 km respectively. Our findings show that when estimating yield with Random CV, all models achieve higher accuracies (R² = 0.58 and RMSE = 840 kg ha-1 for the XGBoost model) as compared to the performance reported when using spatial CV. The10x10 km spatial CV led to the highest R² value equal to 0.23 and an RMSE value equal to 1140 kg ha-1 for the XGBoost model, followed by the 20x20 km grid-based strategy (R² = 0.11 and RMSE = 1227 kg ha-1 for the XGBoost model). Province-based spatial CV resulted in the lowest accuracy with an R² value equal to 0.032 and an RMSE value of 1282 kg ha-1. These results confirm that spatial CV is essential in preventing overoptimistic model performance. The study further highlights the importance of selecting an appropriate CV method to ensure realistic and reliable results in wheat yield predictions as increased accuracy can deviate from real-world conditions due to the effects of spatial autocorrelation.  

How to cite: Khechba, K., Belgiu, M., Laamrani, A., Dong, Q., Stein, A., and Chehbouni, A.: Spatial cross-validation of wheat yield estimations using remote sensing and machine learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22153, https://doi.org/10.5194/egusphere-egu24-22153, 2024.

The aim of this study conducted in Tavira - Portugal, is to show the ability to determine depths without relying on in-situ data. To achieve this goal, a model previously trained with depth data and multispectral images from 2018 was used. This model enables depth determination for any period, providing multispectral images.

For this study, Cube satellite images from the PlanetScope constellation with a spatial resolution of 3.0 m and four spectral bands (blue, green, red, and near-infrared) were used. Corrections due to tidal height were obtained through modeled data provided by the Portuguese Hydrographic Institute for the tide gauge of Faro – Olhão. In-situ depths were obtained through the Digital Elevation Model of Reference (MDER) from the Coastal Monitoring Program of Continental Portugal of the Portuguese Environmental Agency.

The model used to determine depths was previously obtained using the Random Forest (RF) algorithm, trained with a set of reflectances from 15 images acquired between August and October 2018 by the PlanetScope constellation, and a set of depths from the MDER, referring to October 2018.

This RF model allowed the depth determination for a set of 7 images from the same constellation, acquired between August and October 2019. The results were corrected for tidal height to obtain all values in relation to the Hydrographic Zero reference. The Savitzky-Golay filter was applied to smooth the results, and then the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm was applied to eliminate outliers. Finally, the median depth value was determined, resulting in a bathymetric surface morphologically similar to the MDER (2019).

This final surface was compared with the 2019 MDER through differences between the two surfaces (residuals) and the respective statistics were calculated (mean, median, standard deviation, and histogram). A vertical profile between 0.0 and 10.0 meters of depth was also generated. The statistical results of the differences reveal a median of 0.5 meters, a mean of 0.7 meters, and a standard deviation of 1.3 meters. The histogram of differences between the two surfaces follows a normal distribution, with its center located at the median value, which is offset from zero.

The results obtained in this study are promising for obtaining depths in coastal regions through multispectral images without the need for in-situ data. However, we are aware that improving the current model is important to reduce the median and standard deviation of the differences between the determined depth and the reference. Enhancing the model will lead to more accurate results, enabling the determination of seasonal variations and changes caused by extreme events or climate alterations without in-situ data.

How to cite: Santos, R. and Quartau, R.: Predicting bathymetry in shallow regions using a machine learning model and a time series of PlanetScope images, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22165, https://doi.org/10.5194/egusphere-egu24-22165, 2024.

EGU24-1396 | Orals | ESSI1.4

Development of an approach based on historical Landsat data for delineating Canadian flood zones at different return periods 

Karem Chokmani, Haythem Zidi, Anas El Alem, and Jasmin Gill-Fortin

The study addresses the need for flood risk anticipation and planning, through the development of a flood zone mapping approach for different return periods, in order to best prevent and protect populations. Today, traditional methods are too costly, too slow or require too many requirements to be applied over large areas. As part of a project funded by the Canadian Space Agency, Geosapiens and the Institut National de la Recherche Scientifique set themselves the goal of designing an automatic process to generate water presence maps for different return periods at a resolution of 30 m, based on the historical database of Landsat missions from 1982 to the present day. This involved the design, implementation and training of a deep learning algorithm model based on the U-Net architecture for the detection of water pixels in Landsat imagery. The resulting maps were used as the basis for applying a frequency analysis model to fit a probability of occurrence function for the presence of water at each pixel. The frequency analysis data were then used to obtain maps of water occurrence at different return preiods such as 2, 5 and 20 years. 

How to cite: Chokmani, K., Zidi, H., El Alem, A., and Gill-Fortin, J.: Development of an approach based on historical Landsat data for delineating Canadian flood zones at different return periods, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1396, https://doi.org/10.5194/egusphere-egu24-1396, 2024.

EGU24-2378 | Orals | ESSI1.4

Applications of GeoAI in Extracting National Value-Added Products from Historical Airborne Photography 

Mozhdeh Shahbazi, Mikhail Sokolov, Ella Mahoro, Victor Alhassan, Evangelos Bousias Alexakis, Pierre Gravel, and Mathieu Turgeon-Pelchat

Canadian national air photo library (NAPL) comprises millions of historical airborne photographs dating over 100 years. Historical photographs are rich chronicles of countrywide geospatial information. They can be used for creating long-term time series and supporting various analytics such as monitoring expansion/shrinking rates of built areas, forest structure change measurement, measuring thinning and retreating rates of glaciers, and determining rates of erosion at coastlines. Various technical solutions are developed at Natural Resources Canada (NRCan) to generate analysis-ready mapping products from NAPL.

Photogrammetric Processing with a Focus on Automated Georeferencing of Historical Photos: The main technical challenge of photogrammetric processing is identifying reference observations, such as ground control points (GCP). Reference observations are crucial to accurately georeference historical photos and ensure the spatial alignment of historical and modern mapping products. This is critical for creating time series and performing multi-temporal change analytics. In our workflow, GCPs are identified by automatically matching historical images to modern optical satellite/airborne ortho-rectified images. In the matching process, first, we use convolutional neural networks (D2Net) for joint feature detection and description in the intensity space. Then, we convert intensity images to phase congruency maps, which show less sensitivity to nonlinear radiometric differences of the images, and we extract an additional set of features using the Fast detector and describe them using the radiation-invariant feature transform (RIFT). Feature-matching outliers are detected and removed via random sample consensus (Ransac), enforcing a homographic transformation between corresponding images. The remaining control points are manually verified through a graphical interface built as a QGIS plugin. The verified control points are then used in a bundle block adjustment, where external orientation parameters of the historical images and the intrinsic calibration parameters of the cameras are refined, followed by dense matching and generation of digital elevation models and ortho-rectified mosaics using conventional photogrammetric approaches. These solutions are implemented using our in-house libraries as well as MicMac open-source software. Through the presentation, examples of the generated products and their qualities will be demonstrated.

Deep Colourization, Super Resolution and Semantic Segmentation: Considering the fact that NAPL mostly contains grayscale photos, their visual appeal and interpretability are less than modern colour images. In addition, the automated extraction of colour-sensitive features from them, e.g. water bodies, is more complicated than colour images. With this regard, we have developed fully automated approaches to colourize historical ortho-rectified mosaics based on image-to-image translation models. Through the presentation, the performance of a variety of solutions like conditional generative adversarial networks (GAN), encoder-decoder networks, vision transformers, and probabilistic diffusion models will be compared. In addition, using a customized GAN, we improve the spatial resolution of historical images which are scanned from printed photos at low resolution (as opposed to being scanned directly from film rolls at high resolution). Our semantic segmentation models, trained initially on optical satellite and airborne imagery, are also adapted to historical air photos for extracting water bodies, road networks, building outlines, and forested areas. The performance of these models on historical photos will be demonstrated during the presentation.

How to cite: Shahbazi, M., Sokolov, M., Mahoro, E., Alhassan, V., Bousias Alexakis, E., Gravel, P., and Turgeon-Pelchat, M.: Applications of GeoAI in Extracting National Value-Added Products from Historical Airborne Photography, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2378, https://doi.org/10.5194/egusphere-egu24-2378, 2024.

The National Institute of Geographic and Forest Information (IGN) has developed Artificial Intelligence (AI) models that describe land cover at the pixel level from IGN aerial images. This is part of the production process for the Large Scale and Land Use Reference (OCS GE). This contribution is threefold:

Methodology: the training strategy and the use of these models will be reviewed by focusing on i) the selection of the task performed by the models, ii) the approach for choosing and producing learning samples and iii) the training strategy to generalize to the scale of Metropolitan France. The evaluation of the models using various metrics will also be discussed. Visuals will be provided to illustrate the quality of the results. Furthermore, we will explain how AI products are incorporated into the production of the OCS GE.

Continuous improvement: the models are continuously improved, particularly through the implementation of FLAIR (French Land cover from Aerospace ImageRy) challenges towards the scientific community. The challenges FLAIR#1 and FLAIR#2 dealt with model generalization and domain adaptation as well as data fusion, i.e., how to develop an AI model that can process very high spatial resolution images (e.g., IGN aerial acquisitions) and satellite image time series (Sentinel-2 images) as input. We will both review the challenges implementation and the obtained results, leveraging convolutional and attention-based models, ensembling methods and pseudo-labelling. As the AI model for land cover goes far beyond the context of OCS GE production, additional experiments outside of the challenges will be discussed, allowing the development of additional AI models to process other modalities (very high spatial resolution satellite images, historical images, etc.).

Open access: all source code and data, including AI land cover predictions maps, are openly distributed. These resources are distributed via the challenges and as products (CoSIA: Land Cover by Artificial Intelligence) by a dedicated platform, which is of interest for AI users and non-specialists including users from the geoscience and remote sensing community.

How to cite: Garioud, A.: Artificial intelligence for country-scale land cover description., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6477, https://doi.org/10.5194/egusphere-egu24-6477, 2024.

EGU24-7635 | Posters on site | ESSI1.4

iMagine, AI-supported imaging data and services for ocean and marine science 

Ilaria Fava, Alvaro Lopez Garcia, Dick Schaap, Tjerk Krijer, Gergely Sipos, and Valentin Kozlov

Aquatic ecosystems are vital in regulating climate and providing resources, but they face threats from global change and local stressors. Understanding their dynamics is crucial for sustainable use and conservation. The iMagine AI Platform offers a suite of AI-powered image analysis tools for researchers in aquatic sciences, facilitating a better understanding of scientific phenomena and applying AI and ML for processing image data.

The platform supports the entire machine learning cycle, from model development to deployment, leveraging data from underwater platforms, webcams, microscopes, drones, and satellites, and utilising distributed resources across Europe. With a serverless architecture and DevOps approach, it enables easy sharing and deployment of AI models. Four providers within the pan-European EGI federation power the platform, offering substantial computational resources for image processing.

Five use cases focus on image analytics services, which will be available to external researchers through Virtual Access. Additionally, three new use cases are developing AI-based image processing services, and two external use cases are kickstarting through recent Open Calls. The iMagine Competence Centre aids use case teams in model development and deployment, resulting in various models hosted on the iMagine AI Platform, including third-party models like YoloV8.

Operational best practices derived from the platform providers and use case developers cover data management, quality control, integration, and FAIRness. These best practices aim to harmonise approaches across Research Infrastructures and will be disseminated through various channels, benefitting the broader European and international scientific communities.

How to cite: Fava, I., Lopez Garcia, A., Schaap, D., Krijer, T., Sipos, G., and Kozlov, V.: iMagine, AI-supported imaging data and services for ocean and marine science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7635, https://doi.org/10.5194/egusphere-egu24-7635, 2024.

EGU24-8420 | Orals | ESSI1.4

Bayesian model averaging of AI models for the high resolution mapping of the forest canopy height 

Nikola Besic, Nicolas Picard, Cédric Vega, Jean-Pierre Renaud, Martin Schwartz, Milena Planells, and Philippe Ciais

The development of high resolution mapping models of forest attributes based on employing machine or deep learning techniques has increasingly accelerated in the last couple of years. The consequence of this is the widespread availability of multiple sources of information, which can either lead to a potential confusion, or to a possibility to get an "extended” insight into the state of our forests by interpreting these sources jointly. This contribution aims at addressing the latter, by relying on the Bayesian model averaging (BMA) approach.

BMA is a method that can be used in building a consensus from an ensemble of different model predictions. It can be seen as weighted mean of different predictions with weights reflecting the predictive performances of different models, or as a finite mixture model which estimates the probability that each observation from the independent validation dataset has been generated by one of the models belonging to the ensemble. BMA can thus be used to diagnose and understand the difference in the predictions and to possibly interpret them.

The predictions in our case are the forest canopy height estimations for the metropolitan France coming from 5 different AI models [1-5], while the independent validation dataset comes from the French National Forest Inventory (NFI) disposing with some 6000 plots per year, distributed across the territory of interest. For every plot we have several measurements/estimations of the forest canopy height out of which the following two are considered in this study: h_m – the maximum total height (from the tree's base level to the terminal bud of the tree's main stem) measured within the plot, and h_dom – the average height of the seven largest dominant trees per hectare.

In this contribution we present for every considered plot the dominant model with respect to both references i.e. the model having the highest probability to be the one generating measurements/estimations at NFI plot (h_m and h_dom). We present as well as the respective inter-model and the intra-model variance estimations, allowing us to propose a series of hypotheses concerning the established differences between predictions of individual models in function of their specificities.

[1] Schwartz, M., et al.: FORMS: Forest Multiple Source height, wood volume, and biomass maps in France at 10 to 30 m resolution based on Sentinel-1, Sentinel-2, and Global Ecosystem Dynamics Investigation (GEDI) data with a deep learning approach, Earth Syst. Sci. Data, 15, 4927–4945, 2023, https://doi.org/10.5194/essd-15-4927-2023

[2] Lang, N., et al.: A high-resolution canopy height model of the Earth, Nat Ecol Evol 7, 1778–1789, 2023. https://doi.org/10.1038/s41559-023-02206-6

[3] Morin, D. et al.: Improving Heterogeneous Forest Height Maps by Integrating GEDI-Based Forest Height Information in a Multi-Sensor Mapping Process, Remote Sens., 14, 2079. 2022, https://doi.org/10.3390/rs14092079

[4] Potapov, P., et al.: Mapping global forest canopy height through integration of GEDI and Landsat data, Remote Sensing of Environment, 253, 2021, https://doi.org/10.1016/j.rse.2020.112165.

[5] Liu, S. et al.: The overlooked contribution of trees outside forests to tree cover and woody biomass across Europe, Sci. Adv. 9, eadh4097, 2023, 10.1126/sciadv.adh4097.

How to cite: Besic, N., Picard, N., Vega, C., Renaud, J.-P., Schwartz, M., Planells, M., and Ciais, P.: Bayesian model averaging of AI models for the high resolution mapping of the forest canopy height, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8420, https://doi.org/10.5194/egusphere-egu24-8420, 2024.

EGU24-9729 | ECS | Orals | ESSI1.4

Urban 3D Change Detection with Deep Learning: Custom Data Augmentation Techniques 

Riccardo Contu, Valerio Marsocci, Virginia Coletta, Roberta Ravanelli, and Simone Scardapane

The ability to detect changes occurring on the Earth's surface is essential for comprehensively monitoring and understanding evolving landscapes and environments.

To achieve a comprehensive understanding, it is imperative to employ methodologies capable of efficiently capturing and analyzing both two-dimensional (2D) and three-dimensional (3D) changes across various periods.

Artificial Intelligence (AI)  stands out as a primary resource for investigating these alterations, and when combined with Remote sensing (RS) data, it has demonstrated superior performance compared to conventional Change Detection (CD) algorithms.

The recent introduction of the MultiTask Bitemporal Images Transformer [1] (MTBIT) network has made it possible to simultaneously solve 2D and 3D CD tasks leveraging bi-temporal optical images.

However, this network presents certain limitations that necessitate being considered. These constraints encompass a tendency to overfit the training distribution and challenges in inferring extreme values [1]. To address these shortcomings, this work introduces a series of custom augmentations, including strategies like Random Crop, Crop or Resize, Mix up, Gaussian Noise on the 3D CD maps, and Radiometric Transformation. Applied individually or in specific combinations, these augmentations aim to bolster MTBIT's ability to discern intricate geometries and subtle structures that are otherwise difficult to detect.

Furthermore, the evaluation metrics used to assess MTBIT, such as Root Mean Squared Error (RMSE) and the change RMSE (cRMSE), have their limitations. As a response, the introduction of the true positive RMSE (tpRMSE) offers a more comprehensive evaluation, specifically focusing on MTBIT's efficacy in the 3D CD task by considering only the pixels affected by actual elevation changes.

The implementation of custom augmentations particularly when applied in synergy, like Crop or Resize with Gaussian Noise on the 3D map, yielded substantial improvements. These interventions led – through the best augmentation configuration – to the reduction of the cRMSE to 5.88 meters and the tpRMSE to 5.34 meters, compared to the baseline (standard MTBIT) values of 6.33 meters and 5.60 meters, respectively.

The proposed augmentations significantly bolster the practical usability and reliability of MTBIT in real-world applications, effectively addressing critical challenges within the realm of Remote Sensing CD. 

REFERENCES:

 

  • [1] Marsocci, V., Coletta, V., Ravanelli, R., Scardapane, S., Crespi, M., 2023. Inferring 3D change detection from bitemporal optical images. ISPRS Journal of Photogrammetry and Remote Sensing, 196, 325-339

How to cite: Contu, R., Marsocci, V., Coletta, V., Ravanelli, R., and Scardapane, S.: Urban 3D Change Detection with Deep Learning: Custom Data Augmentation Techniques, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9729, https://doi.org/10.5194/egusphere-egu24-9729, 2024.

EGU24-10034 | Orals | ESSI1.4

Operationalize large-scale point cloud classification: potentials and challenges 

Onur Can Bayrak, Ma Zhenyu, Elisa Mariarosaria Farella, and Fabio Remondino

Urban and natural landscapes are distinguished by different built and vegetated elements with unique features, and their proper identification is crucial for many applications, from urban planning to forestry inventory or natural resources management. With the rapid evolution and deployment of high-resolution airborne and Unmanned Aerial Vehicle (UAV) technologies, large areas can be easily surveyed to create high-density point clouds. Photogrammetric cameras and LiDAR sensors can offer unprecedented high-quality 3D data (a few cm on the ground) that allows for discriminating and mapping even small objects. However, the semantic enrichment of these 3D data is still far from being a fully reliable, accurate, unsupervised, explainable and generalizable process deployable at large scale, on data acquired with any sensor, and at any possible spatial resolution.

This work reports the state-of-the-art and recent developments in urban and natural point cloud classification, with a particular focus on the:

  • Standardization in defining the semantic classes through a multi-resolution and multi-scale approach: a multi-level concept is introduced to improve and optimize the learning process by means of a hierarchical concept to accommodate a large number of classes. 
  • Instance segmentation in very dense areas: closely located and overlapping individual objects require precise segmentation to be accurately identified and classified. We are developing a hierarchical segmentation method specifically designed for urban furniture with small samples to enhance the comprehensiveness of dense urban areas.
  • Generalization of the procedures and transferability of developed models from a fully-labelled domain to an unseen scenario.
  • Handling of under-represented objects (e.g., pole-like objects, pedestrians, and other urban furniture): classifying under-represented objects presents a unique set of challenges due to their sparse occurrence and similar geometric characteristics. We introduce a new method that specifically targets the effective identification and extraction of these objects in combination with knowledge-based methods and deep learning.
  • Available datasets and benchmarks to evaluate and compare learning-based methods and algorithms in 3D semantic segmentation: urban-level aerial 3D point cloud datasets can be classified according to the presence of color information, the number of classes, or the type of sensor used for data gathering. The ISPRS - Vaihingen, DublinCity, DALES, LASDU and CENAGIS-ALS datasets, although extensive in size, do not provide color-related information. Conversely, Campus3D, Swiss3DCities, and Hessigheim3D include color data but feature limited coverage and a few class labels. SensatUrban, STPLS3D, and HRHD-HK were collected across extensive urban regions, but they also present a reduced number of classes. YTU3D surpasses other datasets in terms of class diversity, but it encompasses less extensive areas than SensatUrban, STPLS3D, and HRHD-HK. Despite these differences, the common deficiency in all datasets is the presence of classes with under-represented objects, the limited generalization, and the low accuracy in classifying unbalanced categories, making using these models difficult for real-life scenarios.

The presentation will highlight the importance of semantic enrichment processes in the geospatial and mapping domain and for providing more understandable data to end-users and policy-makers. Available learning-based methods, open issues in point cloud classification and recent progress will be explored over urban and forestry scenarios.

How to cite: Bayrak, O. C., Zhenyu, M., Farella, E. M., and Remondino, F.: Operationalize large-scale point cloud classification: potentials and challenges, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10034, https://doi.org/10.5194/egusphere-egu24-10034, 2024.

EGU24-11571 | Posters on site | ESSI1.4

Simple temporal domain adaptation techniques for mapping the inter-annual dynamics of smallholder-dominated croplands over large extents 

Lyndon Estes, Sam Khallaghi, Rahebe Abedi, Mary Asipunu, Nguyen Ha, Boka Luo, Cat Mai, Amos Wussah, Sitian Xiong, and Yao-Ting Yao

Tracking how agricultural systems are changing is critical to answering important questions related to socioeconomic (e.g. food security) and environmental sustainability (e.g. carbon emissions), particularly in rapidly changing regions such as Africa. Monitoring agricultural dynamics requires satellite-based approaches that can accurately map individual fields at frequent (e.g. annual) intervals over national to regional extents, yet mapping Africa's smallholder-dominated agricultural systems is difficult, as the small and indistinct nature of fields promotes mapping error, while frequent cloud cover leads to coverage gaps. Fortunately, the increasing availability of high spatio-temporal resolution imagery and the growing capabilities of deep learning models now make it possible to accurately map crop fields over large extents. However, the ability to make consistently reliable maps for more than one time point remains difficult, given the substantial domain shift between images collected in different seasons or years, which arises from variations in atmospheric and land surface conditions, and results in less accurate maps for times beyond those for which the model was trained. To cope with this domain shift, a model's parameters can be adjusted through fine-tuning on training data from the target time period, but collecting such data typically requires manual annotation of images, which is expensive and often impractical. Alternatively, the approach used to develop the model can be adjusted to improve its overall generalizability. Here we show how combining several fairly standard architectural and input techniques, including careful selection of the image normalization method, increasing the model's width, adding regularization techniques, using modern optimizers, and choosing an appropriate loss function, can significantly enhance the ability of a convolutional neural network to generalize across time, while eliminating the need to collect additional labels. A key component of this approach is the use of Monte Carlo dropout, a regularization technique applied during inference that provides a measure of model uncertainty while producing more robust predictions. We demonstrate this procedure by training an adapted U-Net, a widely used encoder-decoder architecture, with a relatively small number of labels (~5,000 224X224 image chips) collected from 3 countries on 3.7 m PlanetScope composite imagery collected primarily in 2018, and use the model, without fine-tuning, to make reliable maps of Ghana's  (240,000 km2) annual croplands for the years 2018-2023 on 4.8 m Planet basemap mosaics. We further show how this approach helps to track agricultural dynamics by providing a country-wide overview of cropping frequency, while highlighting hotspots of cropland expansion and intensification during the 6-year time period (2018-2023).

How to cite: Estes, L., Khallaghi, S., Abedi, R., Asipunu, M., Ha, N., Luo, B., Mai, C., Wussah, A., Xiong, S., and Yao, Y.-T.: Simple temporal domain adaptation techniques for mapping the inter-annual dynamics of smallholder-dominated croplands over large extents, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11571, https://doi.org/10.5194/egusphere-egu24-11571, 2024.

EGU24-12472 | ECS | Posters on site | ESSI1.4

Building hierarchical use classification based on multiple data sources with a multi-label multimodal transformer network 

Wen Zhou, Claudio Persello, and Alfred Stein

Effective urban planning, city digital twins, and informed policy formulation rely heavily on precise building use information. While existing research often focuses on broad categories of building use, there is a noticeable gap in the classification of buildings’ detailed use. This study addresses this gap by concurrently extracting both broad and detailed hierarchical information regarding building use. Our approach involves leveraging multiple data sources, including high spatial resolution remote sensing images (RS), digital surface models (DSM), street view images (SVI), and textual information from point of interest (POI) data. Given the complexity of mixed-use buildings, where different functions coexist, we treat building hierarchical use classification as a multi-label task, determining the presence of specific categories within a building. To maximize the utility of features across diverse modalities and their interrelationships, we introduce a novel multi-label multimodal Transformer-based feature fusion network. This network can simultaneously predict four broad categories and thirteen detailed categories, representing the first instance of utilizing these four modalities for building use classification. Experimental results demonstrate the effectiveness of our model, achieving a weighted average F1 score (WAF) of 91% for broad categories, 77% for detailed categories, and 84% for hierarchical categories. The macro average F1 scores (MAF) are 81%, 48%, and 56%, respectively. Ablation experiments highlight RS data as the cornerstone for hierarchical building use classification. DSM and POI provide slight supplementary information, while SVI data may introduce more noise than effective information. Our analysis of hierarchy consistency, supplementary, and exclusiveness between broad and detailed categories shows our model can effectively learn these relations. We compared two ways to obtain broad categories: classifying them directly and scaling up detailed categories, associating them with their broad counterparts. Experiments show that the WAF and MAF of the former are 3.8% and 6% higher than the latter. Notably, our research visualizes attention models for different modalities, revealing the synergy among them. Despite the model’s emphasis on SVI and POI data, the critical role of RS and DSM in building hierarchical use classification is underscored. By considering hierarchical use categories and accommodating mixed-use scenarios, our method provides more accurate and comprehensive insights into land use patterns.

How to cite: Zhou, W., Persello, C., and Stein, A.: Building hierarchical use classification based on multiple data sources with a multi-label multimodal transformer network, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12472, https://doi.org/10.5194/egusphere-egu24-12472, 2024.

EGU24-12745 | ECS | Posters on site | ESSI1.4

Mapping the extent and land use intensity of shifting cultivation with Planet Scope imagery and deep learning in the Democratic Republic of Congo  

Wanting Yang, Xiaoye Tong, Sizhuo Li, Daniel Ortiz Gonzalo, and Rasmus Fensholt

Shifting cultivation, in which primary or secondary forest plots are converted into agriculture for one to two years and then left fallow, is often deemed responsible for tropical deforestation. However, the general attribution of deforestation to areas under shifting cultivation is debatable if considering also the component of forest regrowth during the fallow phase, which is an essential part of a mature shifting cultivation system. Yet, little is known about the extent of small-size cropped fields and fallow stages, which are needed to derive information about the temporal development between small-size cropped fields and fallow in shifting cultivation landscapes.

The primary objective of our study is to develop a deep learning-based framework to quantify land use intensity in tropical forest nations such as the Democratic Republic of Congo (DRC) using 4.7-m multi-temporal Planet Basemaps from 2015 to 2023. By employing a convolutional neural network image classification model, we first identified the shifting cultivation landscapes. Secondly, utilizing two-phase imagery, we delve into the temporal development of shifting cultivation, determining whether the landscape continues to be characterized by this practice. Thirdly, the shifting cultivation landscapes were segmented into cropped fields, young fallow, old fallow and old-growth forest/primary forest. Lastly, we used a deep learning regression model to quantify the intensity of shifting cultivation within identified areas. This last step adds depth to our analysis, by offering nuanced insights into the varying practices associated with shifting cultivation practices. Our study in DRC offers a detailed spatio-temporal dataset of the dynamics of shifting cultivation serving as a stepping stone to better understand its impacts on forest loss.

How to cite: Yang, W., Tong, X., Li, S., Ortiz Gonzalo, D., and Fensholt, R.: Mapping the extent and land use intensity of shifting cultivation with Planet Scope imagery and deep learning in the Democratic Republic of Congo , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12745, https://doi.org/10.5194/egusphere-egu24-12745, 2024.

EGU24-13166 | ECS | Orals | ESSI1.4

Remote sensing techniques for habitat condition mapping: deadwood monitoring using airborne laser scanning data 

Agata Walicka, Jesper Bladt, and Jesper Erenskjold Moeslund

Deadwood is a vital part of the habitat for many threatened species of animals, plants and fungi. Thus, presence of deadwood is an important indicator for the probability that a given site harbors threatened species. Nowadays, field work is the most common method for monitoring dead trees. However, it is time consuming, costly and labor-intensive. Therefore, there is a need for an automatic method for mapping and monitoring deadwood. The combination of fine-resolution remote sensing and deep learning techniques have a potential to provide exactly this. Unfortunately, due to the typical location of lying deadwood under the canopy, this is a challenging task as the visibility of the lying trees is limited notably with optical remote sensing techniques. Therefore, laser scanning data seems to be the most appropriate for this purpose as it can penetrate the canopy to some extent and hence gather data from a forest floor.

In this work we aim at the development of methods enabling detection of lying deadwood at the national scale in protected forests and we focus on the presence of deadwood in 15-meter-radius circular plots. To achieve this goal, we use Airborne Laser Scanning (ALS) data that is publicly available for the whole Denmark and, as a reference, almost 6000 forestry plots acquired as a part of the Danish national habitats monitoring program. The binary classification into plots that contain deadwood and the ones that do not is performed using SparseCNN deep neural network. In this study we showed that it is possible to detect plots having deadwood with an overall accuracy of around 61%. However, the accuracy of the classifier depends on the volume of the deadwood present in a plot.  

How to cite: Walicka, A., Bladt, J., and Moeslund, J. E.: Remote sensing techniques for habitat condition mapping: deadwood monitoring using airborne laser scanning data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13166, https://doi.org/10.5194/egusphere-egu24-13166, 2024.

EGU24-15347 | ECS | Posters on site | ESSI1.4

Estimating Crop Phenology from Satellite Data using Machine Learning 

Shahabaldin Shojaeezadeh, Abdelrazek Elnashar, and Tobias Karl David Weber

Monitoring crop growth and development is important for agricultural management and policy interventions enhancing food security worldwide. Traditional methods of examining crop phenology (the timing of growth stages in plants) at large scales often are not sufficiently accurate to make informed decisions about crops. In this study, we proposed an approach that uses a satellite data fusion and Machine Learning (ML) modeling framework to predict crop phenology for eight major crops at field scales (30 meter) across all of Germany. The observed phenology used in this study is based on the citizen science data set of phenological observations covering all of Germany. By fusing satellite data from Landsat and Sentinel-2 images with radar data from Sentinel-1, our method effectively captures information from each publicly available Remote Sensing data source, resulting in precise estimations of phenology timing. Through a fusion analysis, results indicated that combining optical and radar images improves ML model ability to predict phenology with high accuracies with R2 > 0.95 and a mean absolute error of less than 2 days for all the crops. Further analysis of uncertainties confirmed that adding radar data together with optical images improves the modeling reliability of satellite-based predictions of crop phenology. These improvements are expected to be useful for crop model calibrations, facilitate informed agricultural decisions, and contribute to sustainable food production to address the increasing global food demand.

How to cite: Shojaeezadeh, S., Elnashar, A., and Weber, T. K. D.: Estimating Crop Phenology from Satellite Data using Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15347, https://doi.org/10.5194/egusphere-egu24-15347, 2024.

EGU24-15565 | Orals | ESSI1.4

Identification of asbestos roofing from hyperspectral images 

Elena Viero, Donatella Gubiani, Massimiliano Basso, Marco Marin, and Giovanni Sgrazzutti

Regulations relating to the disposal of the use of asbestos was introduced in Italy with Law no. 257 of 1992 and its implementation took place over time. The Regional Asbestos Plan was put in place in 1996 and is updated periodically.

Modern remote sensing techniques constitute an essential tool for studies over an environmental and territorial scale. These systems can detect for each pixel of the acquired image from tens to hundreds of bands of the electromagnetic spectrum. This is useful as any material has its own characteristic spectral signature that can be exploited for different types of investigation.

The work involved the experimentation of a neural network for the classification of airborne remotely sensed hyperspectral images to identify and map the asbestos-cement roofing existing in some Municipalities of the Autonomous Region of Friuli Venezia Giulia.

The Region covers an area of approximately 8,000 square kilometres. To detect the entire area, it was necessary to carry out flights on different directions, different days and with different solar exposure conditions and so, the radiometric quality of the images is not uniform. Moreover, the images have high geometric resolution (1 meter pixel) and radiometric resolution (over 180 bands), that required a particular attention in their management: more than 4,000 images, for a total size of 25-30 TB.

Starting from these hyperspectral images and using the information already available relating to the mapping of the asbestos roofs of 25 Municipalities of the Region, we generated an adequate ground truth to train, test and validate a neural network implemented using the Keras library.

Given the differences in the territories of the various Municipalities, in the first step of the processing we calculated 3 different models generated on different training datasets for each considered Municipality: a total and a partial one that are independent on the considered Municipality, and the last one adapted to the specific Municipality. The combination of these predictions allowed us to obtain a raster result which is supposed to better adapt to the characteristics of the considered Municipality.

Obtained the data, it was then necessary to move on from the raster results to vector data using a zonal analysis on the buildings available in the Regional Numerical Map. An initial automatic classification, determined through the definition of adequate thresholds, was then manually refined exploring it with additional tools, such as Google StreetView and the 10 cm regional orthophoto, to obtain a final refined classification.

The results obtained for the 5 pilot Municipalities represent a clear indication of the presence of asbestos material on some building roofs. This work emphasized an operational workflow using data at a regional scale and could also be easily extended to other territorial entities. It has the great advantage to allow the government authority to save at least an order of magnitude in term of costs with respect to traditional investigations. Finally, the automation of the neural network represents a useful tool for programming, planning and management of the territory also in terms of human health.

How to cite: Viero, E., Gubiani, D., Basso, M., Marin, M., and Sgrazzutti, G.: Identification of asbestos roofing from hyperspectral images, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15565, https://doi.org/10.5194/egusphere-egu24-15565, 2024.

EGU24-18546 | Posters on site | ESSI1.4

A novel index for forest structure complexity mapping from single multispectral images 

Xin Xu, Xiaowei Tong, Martin Brandt, Yuemin Yue, Maurice Mugabowindekwe, Sizhuo Li, and Rasmus Fensholt

Due to the provisioning of essential ecosystem goods and services by forests, the monitoring of forests has attracted considerable attention within the academic community. However, the majority of remote sensing studies covering large areas primarily focus on tree cover due to resolution limitations. It is necessary to integrate innovative spatial methods and tools in the monitoring of forest ecosystems. Forest Structure Complexity, representing the spatial heterogeneity within forest structures, plays a pivotal role in influencing ecosystem processes and functions. In this study, we use multi-spectral remote sensing image data to extract the crown information of the single tree through deep learning technology; Subsequently, we analyze the relationship between each single tree and its neighboring trees, and explore the structural characteristics at tree level. Finally, we developed the canopy structural complexity index and applied it to Nordic forests, urban areas, savanna, rainforest, and the most complex tree plantations and natural forests in China Karst. This study aims to gain a deeper understanding of the forest structure complexity in diverse ecosystems and provide valuable information for sustainable forestry management and ecosystem conservation. The method developed in this study eliminates the need  for additional field measurement and radar data, offering robust tool support for extensive and efficient the monitoring of forest structure complexity, which has a wide application prospect.

How to cite: Xu, X., Tong, X., Brandt, M., Yue, Y., Mugabowindekwe, M., Li, S., and Fensholt, R.: A novel index for forest structure complexity mapping from single multispectral images, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18546, https://doi.org/10.5194/egusphere-egu24-18546, 2024.

EGU24-18679 | ECS | Posters on site | ESSI1.4

Large-scale satellite mapping unveils uneven wetland restoration needs across Europe 

Gyula Mate Kovács, Xiaoye Tong, Dimitri Gominski, Stefan Oehmcke, Stéphanie Horion, and Rasmus Fensholt

Wetlands are crucial carbon sinks for climate change mitigation, yet historical land use changes have resulted in carbon losses and increased CO2 emissions. To combat this, the European Union aims to restore 30% of degraded wetlands in Europe by 2030. However, comprehensive continental-scale inventories are essential for prioritizing restoration and assessing high carbon stock wetlands, revealing the inadequacy of existing datasets. Leveraging 10-meter satellite data and machine learning, our study achieved 94±0.5% accuracy in mapping six wetland types across Europe in 2018. Our analysis identifies that over 40% of European wetlands experience anthropogenic disturbances, with 32.7% classified as highly disturbed due to urban and agricultural activities. Country-level assessments highlight an uneven distribution of restoration needs, emphasizing the urgent importance of data-informed approaches for meaningful restoration. This study underscores the critical need to address land use impact to preserve and enhance wetland carbon storage capabilities.

How to cite: Kovács, G. M., Tong, X., Gominski, D., Oehmcke, S., Horion, S., and Fensholt, R.: Large-scale satellite mapping unveils uneven wetland restoration needs across Europe, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18679, https://doi.org/10.5194/egusphere-egu24-18679, 2024.

EGU24-19198 | ECS | Posters on site | ESSI1.4

Learning-Based Hyperspectral Image Compression Using A Spatio-Spectral Approach 

Niklas Sprengel, Martin Hermann Paul Fuchs, and Prof. Begüm Demir

Advances in hyperspectral imaging have led to a significant increase in the vol-
ume of hyperspectral image archives. Therefore, the development of efficient and
effective hyperspectral image compression methods is an important research topic in
remote sensing. Recent studies show that learning-based compression methods are
able to preserve the reconstruction quality of images at lower bitrates compared to
traditional methods [1]. Existing learning-based image compression methods usu-
ally employ spatial compression per image band or for all bands jointly. However,
hyperspectral images contain a high amount of spectral correlations which neces-
sitates more complex compression architectures that can reduce both spatial and
spectral correlations for a more efficient compression. To address this problem, we
propose a novel Spatio-Spectral Compression Network (S2C-Net).
S2C-Net is a flexible architecture to perform hyperspectral image compression,
exploiting both spatial and spectral dependencies of hyperspectral images. It com-
bines different spectral and spatial autoencoders into a joint model. To this end, a
learning-based pixel-wise spectral autoencoder is initially pre-trained. Then, a spa-
tial autoencoder network is added into the bottleneck of the spectral autoencoder for
further compression of the spatial correlations. This is done by applying the spatial
autoencoder to the output of the spectral encoder and then applying the spectral
decoder to the output of the spatial autoencoder. The model is then trained using
a novel mixed loss function that combines the loss of the spectral and the spatial
model. Since the spatial model is applied on the output of the spectral encoder,
the spatial compression methods that are optimised for 2D image compression can
be used in S2C-Net in the context of hyperspectral image compression.
In the experiments, we have evaluated our S2C-Net on HySpecNet-11k that is
a large-scale hyperspectral image dataset [2]. Experimental results show that S2C-
Net outperforms both spectral and spatial state of the art compression methods for
bitrates lower than 1 bit per pixel per channel (bpppc). Specifically, it can achieve
lower distortion for similar compression rates and offers the possibility to reach
much higher compression rates with only slightly reduced reconstruction quality.

References
[1] F. Zhang, C. Chen, and Y. Wan, “A survey on hyperspectral remote sensing
image compression,” in IEEE IGARSS, 2023, pp. 7400–7403..
[2] M. H. P. Fuchs and B. Demir, “Hyspecnet-11k: A large-scale hyperspectral
dataset for benchmarking learning-based hyperspectral image compression meth-
ods,” in IEEE IGARSS, 2023, pp. 1779–1782.

How to cite: Sprengel, N., Fuchs, M. H. P., and Demir, P. B.: Learning-Based Hyperspectral Image Compression Using A Spatio-Spectral Approach, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19198, https://doi.org/10.5194/egusphere-egu24-19198, 2024.

EGU24-20985 | Orals | ESSI1.4

GeoAI advances in specific landform mapping 

Samantha Arundel, Michael Von Pohle, Ata Akbari Asanjan, Nikunj Oza, and Aaron Lott

Landform mapping (also referred to as geomorphology or geomorphometry) can be divided into two domains: general and specific (Evans 2012). Whereas general landform mapping categorizes all elements of the study area into landform classes, such as ridges, valleys, peaks, and depressions, the mapping of specific landforms requires the delineation (even if fuzzy) of individual landforms. The former is mainly driven by physical properties such as elevation, slope, and curvature.  The latter, however, must consider the cognitive (human) reasoning that discriminates individual landforms in addition to these physical properties (Arundel and Sinha 2018).

Both mapping forms are important. General geomorphometry is needed to understand geological and ecological processes and as boundary layer input to climate and environmental models. Specific geomorphometry supports such activities as disaster management and recovery, emergency response, transportation, and navigation.

In the United States, individual landforms of interest are named in the U.S. Geological Survey (USGS) Geographic Names Information System, a point dataset captured specifically to digitize geographic names from the USGS Historical Topographic Map Collection (HTMC). Named landform extent is represented only by the name placement in the HTMC.

Recent work has investigated CNN-based deep learning methods to capture these extents in machine-readable form. These studies first relied on physical properties (Arundel et al. 2020) and then included the HTMC as a band in RGB images in limited testing (Arundel et al. 2023). Results from the HTMC dataset surpassed those using just physical properties. The HTMC alone performed best due to the hillshading and elevation (contour) data incorporated into the topographic maps. However, results fell short of an operational capacity to map all named landforms in the United States. Thus, our current work expands upon past research by focusing on the HTMC and physical information as inputs and the named landform label extents.

Specifically, we propose to leverage pre-trained foundation models for segmentation and optical character recognition (OCR) models to jointly map landforms in the United States. Our approach aims to bridge the disparities among independent information sources to facilitate informed decision-making. The modeling pipeline performs (1) segmentation using the physical information and (2) information extraction using OCR in parallel. Then, a computer vision approach merges the two branches into a labeled segmentation. 

References

Arundel, Samantha T., Wenwen Li, and Sizhe Wang. 2020. “GeoNat v1.0: A Dataset for Natural Feature Mapping with Artificial Intelligence and Supervised Learning.” Transactions in GIS 24 (3): 556–72. https://doi.org/10.1111/tgis.12633.

Arundel, Samantha T, and Gaurav Sinha. 2018. “Validating GEOBIA Based Terrain Segmentation and Classification for Automated Delineation of Cognitively Salient Landforms In Proceedings of Workshops and Posters at the 13th International Conference on Spatial Information Theory (COSIT 2017), Lecture Notes in Geoinformation and Cartography, edited by Paolo Fogliaroni, Andrea Ballatore, and Eliseo Clementini, 9–14. Cham: Springer International Publishing.

Arundel, Samantha T., Gaurav Sinha, Wenwen Li, David P. Martin, Kevin G. McKeehan, and Philip T. Thiem. 2023. “Historical Maps Inform Landform Cognition in Machine Learning.” Abstracts of the ICA 6 (August): 1–2. https://doi.org/10.5194/ica-abs-6-10-2023.

Evans, Ian S. 2012. “Geomorphometry and Landform Mapping: What Is a Landform?” Geomorphology 137 (1): 94–106. https://doi.org/10.1016/j.geomorph.2010.09.029.

How to cite: Arundel, S., Von Pohle, M., Akbari Asanjan, A., Oza, N., and Lott, A.: GeoAI advances in specific landform mapping, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20985, https://doi.org/10.5194/egusphere-egu24-20985, 2024.

EGU24-420 | ECS | Orals | ESSI1.5

Neural Networks for Surrogate Models of the Corona and Solar Wind 

Filipa Barros, João José Graça Lima, Rui F. Pinto, and André Restivo

In previous work, an Artificial Neural Network (ANN) was developed to automate the estimation of solar wind profiles used as initial conditions in MULTI-VP simulations. This approach, coupled with profile clustering, reduced the time previously required for estimation by MULTI-VP, enhancing the efficiency of the simulation process. It was observed that generating initial estimates closer to the final simulation led to reduced computation time, with a mean speedup of 1.13. Additionally, this adjustment yielded a twofold advantage: it minimized the amplitude of spurious transients, reinforcing the numerical stability of calculations and enabling the code to maintain a more moderate integration time step.

However, upon further analysis, it became evident that the physical model inherently required a relaxation time for the final solution to stabilize. Therefore, while refining initial conditions offered improvements, there was a limit to how much it could accelerate the process. Consequently, attention turned towards the development of a surrogate model focused on the upper corona (from 3 solar radii to 30 solar radii). This range was chosen because the model can avoid learning the initial phases of wind acceleration, which are hard to accurately predict. Moreover, in order to connect the model to heliospheric models and for space weather applications, more than 3 radii is more than sufficient and guarantees that the physics remain consistent within the reproducible domain.

This surrogate model aims at delivering faster forecasts, with MULTI-VP running in parallel (eventually refining the solutions). The surrogate model for MULTI-VP was tested using a heliospheric model and data from spacecraft at L1, validating its efficacy beyond Mean Squared Error (MSE) evaluations and ensuring physical conservation principles were upheld.

This work aims at simplifying and accelerating the process of establishing boundary conditions for heliospheric models without dismissing the physical models for both extreme events and for more physically accurate results. 

How to cite: Barros, F., Lima, J. J. G., F. Pinto, R., and Restivo, A.: Neural Networks for Surrogate Models of the Corona and Solar Wind, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-420, https://doi.org/10.5194/egusphere-egu24-420, 2024.

we show the evolutions of the separated strands within the apparent single coronal loops observed in Atmospheric Imaging Assembly (AIA) images. The loop strands are detected on  the upsampled AIA 193 equation.pdf images, which are   generated using a super-resolution convolutional neural  network, respectively. The architecture of the network is designed to map the AIA images to unprecedentedly high spatial resolution coronal images taken by  High-resolution Coronal Imager (Hi-C) during its brief flight. At some times, pairs of individual strands appeared to braid with each other and subsequently evolved to become pairs of almost parallel ones with their segments having exchanged totally.  These evolutions provide  morphological evidence supporting occurrences of magnetic reconnections between the braiding strands, which are further confirmed by  the occurrences of the transient hot emissions (>5 MK)  located at the footpoints of  the braiding structures. 

How to cite: Bi, Y.: The coronal braiding structures detected in the machine-learning upscaled SDO/AIA images, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1494, https://doi.org/10.5194/egusphere-egu24-1494, 2024.

EGU24-1604 | ECS | Orals | ESSI1.5

Machine Learning Synthesis and inversion method for Stokes Parameters in the solar context 

Juan Esteban Agudelo Ortiz, Germain Nicolás Morales Suarez, Santiago Vargas Domínguez, and Sergiy Shelyag

The arrival of new and more powerful spectropolarimetric instruments such as DKIST, the development of better magnetohydrodinamic (MHD) simulation codes and the creation of newly inversion methods, are coming with the demands of increasing amounts of computational time and power. This, with increasing generation of data, will come with even years of processing that will stop the advance of scientific investigations on mid-late stages. The arrival of Machine Learning models able to replicate patterns in data come with the possibilites of them to adapt to different types of datasets, such as those for classification or for creation of sequences like the seq2seq models, that once trained, they are able to give results according to previous methods that differ on order of magnitude in time processing, being a lot faster. Some work has been done within this field for creating machine learning inversion methods using data obtained from actual inversion codes applied on observational data, and using data from radiative transfer codes for synthesis, reducing both computational demands and time processing. This work attempts to follow onto this steps, using in this case datasets obtained from simulation codes like MURaM and their correspondent Stokes parameters obtained from non-lte radiative transfer codes like NICOLE, training forward (synthesis) and backward (inversion) some neural network models to test whether or not they can learn their physical behaviours and at what accuracy, for being used in the future to process actual data obtained from newly simulation codes and for real solar observations, being another step into the future for creating a new paradigm on how to invert and sunthesize quantities in Physics in general.

How to cite: Agudelo Ortiz, J. E., Morales Suarez, G. N., Vargas Domínguez, S., and Shelyag, S.: Machine Learning Synthesis and inversion method for Stokes Parameters in the solar context, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1604, https://doi.org/10.5194/egusphere-egu24-1604, 2024.

EGU24-2046 | ECS | Posters on site | ESSI1.5

Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares 

Filip Arnaut, Aleksandra Kolarski, and Vladimir Srećković

In our previous publication (Arnaut et al. 2023), we demonstrated the application of the Random Forest (RF) algorithm for classifying disturbances associated with solar flares (SF), erroneous signals, and measurement errors in VLF amplitude data i.e., anomaly detection in VLF amplitude data. The RF algorithm is widely regarded as a preferred option for conducting research in novel domains. Its advantages, such as its ability to avoid overfitting data and its simplicity, make it particularly valuable in these situations. Nevertheless, it is imperative to conduct thorough testing and evaluation of alternative algorithms and methods to ascertain their potential advantages and enhance the overall efficiency of the method. This brief communication demonstrates the application of the XGBoost (XGB) method on the exact dataset previously used for the RF algorithm, along with a comparative analysis between the two algorithms. Given that the problem is framed as a machine learning (ML) problem with a focus on the minority class, the comparative analysis is exclusively conducted using the minority (anomalous) data class. The data pre-processing methodology can be found in Arnaut et al. (2023). The XGB tuning process involved using a grid search method to optimize the hyperparameters of the model. The number of estimators (trees) was varied from 25 to 500 in increments of 25, and the learning rate was varied from 0.02 to 0.4 in increments of 0.02. The F1-Score for the anomalous data class is similar for both models, with a value of 0.508 for the RF model and 0.51 for the XGB model. These scores were calculated using the entire test dataset, which consists of 19 transmitter-receiver pairs. Upon closer examination, it becomes evident that the RF model exhibits a higher precision metric (0.488) than the XGB model (0.37), while the XGB model demonstrates a higher recall metric (0.84) compared to the RF model (0.53). Upon examining each individual transmitter-receiver pair, it was found that XGB outperformed RF in terms of F1-Scores in 10 out of 19 cases. The most significant disparities are observed in cases where the XGB model outperformed by a margin of 0.15 in terms of F1-Score, but conversely performed worse by approximately -0.16 in another instance for the anomalous data class. The XGB models outperformed the RF model by approximately 6.72% in terms of the F1-score for the anomalous data class when averaging all the 19 transmitter-receiver pairs. When utilizing a point-based evaluation metric that assigns rewards or penalties for each entry in the confusion matrix, the RF model demonstrates an overall improvement of approximately 5% compared to the XGB model. Overall, the comparison between the RF and XGB models is ambiguous. Both models have instances where one is superior to the other. Further research is necessary to fully optimize the method, which has benefits in automatically classifying VLF amplitude anomalous signals caused by SF effects, erroneous measurements, and other factors.

References:

Arnaut, F., Kolarski, A. and Srećković, V.A., 2023. Random Forest Classification and Ionospheric Response to Solar Flares: Analysis and Validation. Universe9(10), p.436.

How to cite: Arnaut, F., Kolarski, A., and Srećković, V.: Comparative Analysis of Random Forest and XGBoost in Classifying Ionospheric Signal Disturbances During Solar Flares, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2046, https://doi.org/10.5194/egusphere-egu24-2046, 2024.

EGU24-4181 | Posters on site | ESSI1.5

Prediction of sunspot number using Gaussian processes 

Everton Frigo and Italo Gonçalves

The solar activity has various direct and indirect impacts on human activities. During periods of high solar activity, the harmful effects triggered by solar variability are maximized. On a decadal to multidecadal time scale, solar variability exhibits a main cycle of around 11 years known as the Schwabe solar cycle, leading to a solar maximum approximately every 11 years. The most commonly used variable for measuring solar activity is the sunspot number. Over the last few decades, numerous techniques have been employed to predict the time evolution of the solar cycle for subsequent years. Recently, there has been a growing number of studies utilizing machine learning methods to predict solar cycles. One such method is the Gaussian process, which is well-suited for working with small amounts of data and can also provide an uncertainty measure for predictions. In this study, the Gaussian process technique is employed to predict the sunspot number between 2024 and 2050. The dataset used to train and validate the model comprises monthly averages of sunspots relative to the period 1700-2023. According to the results, the current solar cycle, currently at its maximum, is anticipated to last until 2030. The subsequent solar maximum is projected to occur around the end of 2033, with an estimated maximum sunspot number of approximately 150. If this prediction holds true, the next solar cycle's maximum will resemble that observed in the current one.

How to cite: Frigo, E. and Gonçalves, I.: Prediction of sunspot number using Gaussian processes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4181, https://doi.org/10.5194/egusphere-egu24-4181, 2024.

EGU24-4471 | ECS | Orals | ESSI1.5

Solar Wind Speed Estimation via Symbolic Knowledge Extraction from Opaque Models 

Federico Sabbatini and Catia Grimani

The unprecedented predictive capabilities of machine learning models make them inestimable tools to perform data forecasting and other complex tasks. Benefits of these predictors are even more precious when there is the necessity of surrogating unavailable data due to the lack of dedicated instrumentation on board space missions. For instance, the future ESA space interferometer LISA for low-frequency gravitational wave detection will host, as part of its diagnostics subsystem, particle detectors to measure the galactic cosmic-ray flux and magnetometers to monitor the magnetic field intensity in the region of the interferometer mirrors. No instrumentation dedicated to the interplanetary medium parameter monitoring will be placed on the three spacecraft constituting the LISA constellation. However, important lessons about the correlation between galactic cosmic-ray flux short-term variations and the solar wind speed profile have been learned with the ESA LISA precursor mission, LISA Pathfinder, orbiting around the L1 Lagrange point. In a previous work, we have demonstrated that for LISA Pathfinder it was possible to reconstruct with an uncertainty of 2 nT the interplanetary magnetic field intensity for interplanetary structure transit monitoring. Machine learning models are proposed here to infer the solar wind speed that is not measured on the three LISA spacecraft from galactic cosmic-ray measurements. This work is precious and necessary since LISA, scheduled to launch in 2035, will trail Earth on the ecliptic at 50 million km distance, too far from the orbits of other space missions dedicated to the interplanetary medium monitoring to benefit of their observations.

We built an interpretable machine learning predictor based on galactic cosmic-ray and interplanetary magnetic field observations to obtain a solar wind speed reconstruction within ±65 km s-1 of uncertainty. Interpretability is achieved by applying the CReEPy symbolic knowledge extractor to the outcomes of a k-NN regressor. The extracted knowledge consists of linear equations aimed at describing the solar wind speed in terms of four statistical indices calculated for the input variables.

Details about the model workflow, performance and validation will be presented at the conference, together with the advantages, drawbacks and possible future enhancements, to demonstrate that our model may provide the LISA mission with an effective and human-interpretable tool to carry out reliable solar wind speed estimates and recognise the transit of interplanetary structures nearby the LISA spacecraft, as a support to the data analysis activity for the monitoring of the external forces acting on the spectrometer mirrors.

How to cite: Sabbatini, F. and Grimani, C.: Solar Wind Speed Estimation via Symbolic Knowledge Extraction from Opaque Models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4471, https://doi.org/10.5194/egusphere-egu24-4471, 2024.

EGU24-6558 | ECS | Orals | ESSI1.5

A New Machine Learning Approach for Predicting Extreme Space Weather 

Andong Hu and Enrico Camporeale

We present an innovative method, ProBoost (Probabilistic Boosting), for forecasting extreme space weather events using ensemble machine learning (ML). Ensembles enhance prediction accuracy, but applying them to ML faces challenges as ML models often lack wellcalibrated uncertainty estimates. Moreover, space weather problems are typically affected by very imbalanced datasets (i.e., extreme and rare events) To overcome these difficulties, we developed a method that incorporates uncertainty quantification (UQ) in neural networks, enabling simultaneous forecasting of prediction uncertainty.
Our study applies ProBoost to the following space weather applications:
• One-to-Six-Hour Lead-Time Model: Predicting Disturbance Storm Time (Dst) values using solar wind data.
• Two-Day Lead-Time Model: Forecasting Dst probability using solar images.
• Geoelectric Field Model: Multi-hour lead time, incorporating solar wind and SuperMag data.
• Ambient Solar Wind Velocity Forecast: Up to 5 days ahead.
ProBoost is model-agnostic, making it adaptable to various forecasting applications beyond space weather.

How to cite: Hu, A. and Camporeale, E.: A New Machine Learning Approach for Predicting Extreme Space Weather, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6558, https://doi.org/10.5194/egusphere-egu24-6558, 2024.

We use the framework of Physics-Informed Neural Network (PINN) to solve the inverse problem associated with the Fokker-Planck equation for radiation belts' electron transport, using 4 years of Van Allen Probes data. Traditionally, reduced models have employed a diffusion equation based on the quasilinear approximation. We show that the dynamics of “killer electrons” is described more accurately by a drift-diffusion equation, and that drift is as important as diffusion for nearly-equatorially trapped ∼1 MeV electrons in the inner part of the belt. Moreover, we present a recipe for gleaning physical insight from solving the ill-posed inverse problem of inferring model coefficients from data using PINNs. Furthermore, we derive a parameterization for the diffusion and drift coefficients as a function of L only, which is both simpler and more accurate than earlier models. Finally, we use the PINN technique to develop an automatic event identification method that allows identifying times at which the radial transport assumption is inadequate to describe all the physics of interest.

How to cite: Camporeale, E.: Data-Driven Discovery of Fokker-Planck Equation for the Earth's Radiation Belts Electrons Using Physics-Informed Neural Networks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6899, https://doi.org/10.5194/egusphere-egu24-6899, 2024.

The detection of asteroids involves the processing of sequences of astronomical images. The main challenges arise from the huge volume of data that should be processed in a reasonable amount of time. To address this, we developed the NEARBY platform [1], [2] for efficiently automatic detection of asteroids in sequence of astronomical images. This platform encompasses multidimensional data processing capabilities, human-verified visual analysis, and cloud-based adaptability. This paper outlines the enhancements we have made to this automated asteroid detection system by integrating a machine learning-based classifier known as the CERES module. The integration of the CERES module [3] into the NEARBY platform substantially enhances its performance by automatically reducing the number of false positive detections. Consequently, this leads to a more reliable and efficient system for asteroid identification, while also reducing the time and effort required by human experts to validate detected candidates (asteroids). The experiments highlight these improvements and their significance in advancing the field of asteroid tracking. Additionally, we explore the applicability of the asteroid classification model, initially trained using images from a specific telescope, across different telescopes.

Acknowledgment:

  • This work was supported by a grant of the Romanian Ministry of Education and Research, CCCDI - UEFISCDI, project number PN-III-P2-2.1-PED-2019-0796, within PNCDI III. (the development of the dataset and CNN models)
  • This research was partially supported by the project 38 PFE in the frame of the programme PDI-PFE-CDI 2021.

References:

  • Bacu, V., Sabou, A., Stefanut, T., Gorgan, D., Vaduvescu, O., NEARBY platform for detecting asteroids in astronomical images using cloud-based containerized applications, 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 371-376
  • Stefanut, T., Bacu, V., Nandra, C., Balasz, D., Gorgan, D., Vaduvescu, O., NEARBY Platform: Algorithm for automated asteroids detection in astronomical images, 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 365-369
  • Bacu, V.; Nandra, C.; Sabou, A.; Stefanut, T.; Gorgan, D. Assessment of Asteroid Classification Using Deep Convolutional Neural Networks. Aerospace 2023, 10, 752. https://doi.org/10.3390/aerospace10090752

 

How to cite: Bacu, V.: Enhancement of the NEARBY automated asteroid detection platform with a machine learning-based classifier, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8018, https://doi.org/10.5194/egusphere-egu24-8018, 2024.

EGU24-9174 | ECS | Orals | ESSI1.5

Enhancing Space Mission Return through On-Board Data Reduction using Unsupervised Machine Learning 

Salome Gruchola, Peter Keresztes Schmidt, Marek Tulej, Andreas Riedo, Klaus Mezger, and Peter Wurz

The efficient use of the provided downlink capacity for scientific data is a fundamental aspect of space exploration. The use thereof can be optimised through sophisticated data reduction techniques and automation of processes on board that otherwise require interaction with the operations centres on Earth. Machine learning-based autonomous methods serve both purposes; yet space-based ML applications remain relatively rare compared to the application of ML on Earth to data acquired in space.

In this contribution, we present a potential application of unsupervised machine learning to cluster mass spectrometric data on-board a spacecraft. Data were acquired from a phoscorite rock [1] using a prototype of a laser ablation ionisation mass spectrometer (LIMS) for space research [2]. Two unsupervised dimensionality reduction algorithms, UMAP and densMAP [3,4], were employed to construct low-dimensional representations of the data. Clusters corresponding to different mineral phases within these embeddings were found using HDBSCAN [5]. The impact of data pre-processing and model parameter selection on the classification outcome was investigated through varying levels of pre-processing and extensive grid searches.

Both UMAP and densMAP effectively isolated major mineral phases present within the rock sample, but densMAP additionally found minor inclusions present only in a small number of mass spectra. However, densMAP exhibited higher sensitivity to data pre-processing, yielding lower scores for minimally treated data compared to UMAP. For highly processed data, both UMAP and densMAP exhibited high stability across a broad model parameter space.

Given that the data were recorded using a miniature mass spectrometric instrument designed for space flight, these methods demonstrate effective strategies for substantial reduction of data similarly to what is anticipated on future space missions. Autonomous clustering of data into groups of different chemical composition, followed by the downlink of a representative mass spectrum of each cluster, aids in identifying relevant data. Mission return can therefore be enhanced through the selective downlink of data of interest. As both UMAP and densMAP, coupled with HDBSCAN, are relatively complex algorithms compared to more traditional techniques, such as k-means, it is important to evaluate the benefits and drawbacks of using simpler methods on-board spacecraft.

 

[1] Tulej, M. et al., 2022, https://doi.org/10.3390/universe8080410.

[2] Riedo, A. et al., 2012, https://doi.org/10.1002/jms.3104.

[3] McInnes, L. et al., 2018, https://doi.org/10.48550/arXiv.1802.03426.

[4] Narayan, A., et al., 2021, https://doi.org/10.1038/s41587-020-00801-7.

[5] McInnes, L., et al., 2017, https://doi.org/10.21105/JOSS.00205.

How to cite: Gruchola, S., Keresztes Schmidt, P., Tulej, M., Riedo, A., Mezger, K., and Wurz, P.: Enhancing Space Mission Return through On-Board Data Reduction using Unsupervised Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9174, https://doi.org/10.5194/egusphere-egu24-9174, 2024.

EGU24-10715 | ECS | Posters on site | ESSI1.5

Physics-driven feature combination for an explainable AI approach to flare forecasting 

Margherita Lampani, Sabrina Guastavino, Michele Piana, Federico Benvenuto, and Anna Maria Massone

Typical supervised feature-based machine learning approaches to flare forecasting rely on descriptors extracted from magnetograms, as from Helioseismic and Magnetic Imager (HMI) images, and standardized before being used in the training phase of the machine learning pipeline. However, this artificial intelligence (AI) model does not take into account the physical nature of the features and their role in the plasma physics equations. This talk proposes to generate novel features according to simple physics-driven combinations of the original descriptors, and to show whether this original physically explainable AI model leads to a more predictive solar flare forecasting.

How to cite: Lampani, M., Guastavino, S., Piana, M., Benvenuto, F., and Massone, A. M.: Physics-driven feature combination for an explainable AI approach to flare forecasting, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10715, https://doi.org/10.5194/egusphere-egu24-10715, 2024.

EGU24-12885 | ECS | Posters on site | ESSI1.5

Finding Hidden Conjunctions in the Solar Wind 

Zoe Faes, Laura Hayes, Daniel Müller, and Andrew Walsh

This study aims to identify sets of in-situ measurements of the solar wind which sample the same volume of plasma at different times and locations as it travels through the heliosphere using ensemble machine learning methods. Multiple observations of a single volume of plasma by different spacecraft - referred to here as conjunctions - are becoming more frequent in the current “golden age of heliophysics research” and are key to characterizing the expansion of the solar wind. Specifically, identifying these related observations will enable us to test the current understanding of solar wind acceleration from the corona to the inner heliosphere with a more comprehensive set of measurements than has been used in previous analyses.

Using in-situ measurements of the background solar wind from Solar Orbiter, Parker Solar Probe, STEREO-A, Wind and BepiColombo, we identify a set of criteria based on features of magnetic field, velocity, density and temperature timeseries of known conjunctions and search for other instances for which the criteria are satisfied, to find previously unknown conjunctions. We use an ensemble of models, including random forests and recurrent neural networks with long short-term memory trained on synthetic observations obtained from magnetohydrodynamic simulations, to identify candidate conjunctions solely from kinetic properties of the solar wind. Initial results show a previously unidentified set of conjunctions between the spacecraft considered in this study. While this analysis has thus far only been performed on observations obtained since 2021 (start of Solar Orbiter science operations), the methods used here can be applied to other datasets to increase the potential for scientific return of existing and future heliophysics missions.

The modular scientific software built over the course of this research includes methods for the retrieval, processing, visualisation, and analysis of observational and synthetic timeseries of solar wind properties. It also includes methods for feature engineering and integration with widely used machine learning libraries. The software is available as an open-source Python package to ensure results can be easily reproduced and to facilitate further investigation of coordinated in-situ data in heliophysics.

How to cite: Faes, Z., Hayes, L., Müller, D., and Walsh, A.: Finding Hidden Conjunctions in the Solar Wind, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12885, https://doi.org/10.5194/egusphere-egu24-12885, 2024.

EGU24-12961 | ECS | Posters on site | ESSI1.5

Physics-informed neural networks for advanced solar magnetic field extrapolations 

Robert Jarolim, Benoit Tremblay, Matthias Rempel, Julia Thalmann, Astrid Veronig, Momchil Molnar, and Tatiana Podladchikova

Physics-informed neural networks (PINNs) provide a novel approach for data-driven numerical simulations, tackling challenges of discretization and enabling seamless integration of noisy data and physical models (e.g., partial differential equations). In this presentation, we discuss the results of our recent studies where we apply PINNs for coronal magnetic field extrapolations of the solar atmosphere, which are essential to understand the genesis and initiation of solar eruptions and to predict the occurrence of high-energy events from our Sun.
We utilize our PINN to estimate the 3D coronal magnetic fields based on photospheric vector magnetograms and the force-free physical model. This approach provides state-of-the-art coronal magnetic field extrapolations in quasi real-time. We simulate the evolution of Active Region NOAA 11158 over 5 continuous days, where the derived time profile of the free magnetic energy unambiguously relates to the observed flare activity.
We extend this approach by utilizing multi-height magnetic field measurements and combine them in a single magnetic field model. Our evaluation shows that the additional chromospheric field information leads to a more realistic approximation of the solar coronal magnetic field. In addition, our method intrinsically provides an estimate of the height corrugation of the observed magnetograms.
We provide an outlook on our ongoing work where we use PINNs for global force-free magnetic field extrapolations. This approach enables a novel understanding of the global magnetic topology with a realistic treatment of current carrying fields.
In summary, PINNs have the potential to greatly advance the field of numerical simulations, accelerate scientific research, and enable advanced space weather monitoring.

How to cite: Jarolim, R., Tremblay, B., Rempel, M., Thalmann, J., Veronig, A., Molnar, M., and Podladchikova, T.: Physics-informed neural networks for advanced solar magnetic field extrapolations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12961, https://doi.org/10.5194/egusphere-egu24-12961, 2024.

EGU24-14186 | Posters on site | ESSI1.5

Near real-time construction of Solar Coronal Parameters based on MAS simulation by Deep Learning  

Sumiaya Rahman, Hyun-Jin Jeong, Ashraf Siddique, and Yong-Jae Moon

Magnetohydrodynamic (MHD) models provide a quantitative 3D distribution of the solar corona parameters (density, radial velocity, and temperature). However, this process is expensive and time-consuming. For this, we apply deep learning models to reproduce the 3D distribution of solar coronal parameters from 2D synoptic photospheric magnetic fields. We consider synoptic photospheric magnetic fields as an input to obtain 3D solar coronal parameters simulated by the MHD Algorithm outside a Sphere (MAS) from June 2010 to January 2023. Each parameter is individually trained using 150 deep learning models, corresponding to 150 solar radial distances ranging from 1 to 30 solar radii. Our study yields significant findings. Firstly, our model accurately reproduces 3D coronal parameter structures across the 1 to 30 solar radii range, demonstrating an average correlation coefficient value of approximately 0.96. Secondly, the 150 deep-learning models exhibit a remarkably shorter runtime (about 16 seconds for each parameter), with an NVIDIA Titan XP GPU, in comparison to the conventional MAS simulation time. As the MAS simulation is a regularization model, we may significantly reduce the simulation time by using our results as an initial magnetic configuration to obtain an equilibrium condition. In the future, we hope that the generated solar coronal parameters can be used for near real-time forecasting of heliospheric propagation of solar eruptions.

How to cite: Rahman, S., Jeong, H.-J., Siddique, A., and Moon, Y.-J.: Near real-time construction of Solar Coronal Parameters based on MAS simulation by Deep Learning , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14186, https://doi.org/10.5194/egusphere-egu24-14186, 2024.

EGU24-15813 | ECS | Orals | ESSI1.5

Instrument-to-Instrument translation: An AI tool to intercalibrate, enhance and super-resolve solar observations 

Christoph Schirninger, Astrid Veronig, Robert Jarolim, J. Emmanuel Johnson, Anna Jungbluth, Richard Galvez, Lilli Freischem, and Anne Spalding

Various instruments are used to study the Sun, including ground-based observatories and space telescopes. These data products are constantly changing due to technological improvements, different instrumentation, or atmospheric effects. However, for certain applications such as ground-based solar image reconstruction or solar cycle studies, enhanced and combined data products are necessary.

We present a general AI tool called Instrument-to-Instrument (ITI; Jarolim et al. 2023) translation, which is capable of translating datasets between two different image domains. This approach enables instrument intercalibration, image enhancement, mitigation of quality degradations, and super-resolution across multiple wavelength bands. The tool is built on unpaired image-to-image translation, which enables a wide range of applications, where no spatial or temporal overlap is required between the considered datasets.

In this presentation, we highlight ITI as a general tool for Heliospheric applications and demonstrate its capabilities by applying it to data from Solar Orbiter/EUI, PROBA2/SWAP, and the Solar Dynamics Observatory/AIA in order to achieve a homogenous, machine-learning ready dataset that combines three different EUV imagers. 

The direct comparison of aligned observations shows the close relation of ITI-enhanced and real high-quality observations. The evaluation of light-curves demonstrates an improved inter-calibration.

ITI is provided open-source to the community  and can be easily applied to novel datasets and various research applications. 

This research is funded through a NASA 22-MDRAIT22-0018 award (No 80NSSC23K1045) and managed by Trillium Technologies, Inc (trillium.tech)

How to cite: Schirninger, C., Veronig, A., Jarolim, R., Johnson, J. E., Jungbluth, A., Galvez, R., Freischem, L., and Spalding, A.: Instrument-to-Instrument translation: An AI tool to intercalibrate, enhance and super-resolve solar observations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15813, https://doi.org/10.5194/egusphere-egu24-15813, 2024.

EGU24-15981 | ECS | Posters on site | ESSI1.5

Addressing the closure problem using supervised Machine Learning 

Sophia Köhne, Brecht Laperre, Jorge Amaya, Sara Jamal, Simon Lautenbach, Rainer Grauer, Giovanni Lapenta, and Maria Elena Innocenti

When deriving fluid equations from the Vlasov equation for collisionless plasmas, one runs into the so-called closure problem: each equation for the temporal evolution of one particle moment (density, current, pressure, heat flux, …) includes terms depending on the next order moment. Therefore, when choosing to truncate the description at the nth order, one must approximate the terms related to the (n+1)th order moment included in the evolution equation for the nth order moment. The order at which the hierarchy is closed and the assumption behind the approximations used determine how accurately a fluid description can reproduce kinetic processes.

In this work, we aim at reconstructing specific particle moments from kinetic simulations, using as input the electric and magnetic field and the lower moments. We use fully kinetic Particle In Cell simulations, where all physical information is available, as the ground truth. The approach we present here uses supervised machine learning to enable a neural network to learn how to reconstruct higher moments from lower moments and fields.

Starting from the work of Laperre et al., 2022 we built a framework which makes it possible to train feedforward multilayer perceptrons on kinetic simulations to learn to predict the higher moments of the Vlasov equation from the lower moments, which would also be available in fluid simulations. We train on simulations of magnetic reconnection in a double Harris current sheet with varying background guide field obtained with the semi-implicit Particle-in-Cell code iPiC3D (Markidis et al, 2010). We test the influence of data preprocessing techniques, of (hyper-)parameter variations and of different architectures of the neural networks on the quality of the predictions that are produced. Furthermore, we investigate which metrics are most useful to evaluate the quality of the outcome.

How to cite: Köhne, S., Laperre, B., Amaya, J., Jamal, S., Lautenbach, S., Grauer, R., Lapenta, G., and Innocenti, M. E.: Addressing the closure problem using supervised Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15981, https://doi.org/10.5194/egusphere-egu24-15981, 2024.

EGU24-18534 | ECS | Posters on site | ESSI1.5

Visualizing three years of STIX X-ray flare observations using self-supervised learning 

Mariia Drozdova, Vitaliy Kinakh, Francesco Ramunno, Erica Lastufka, and Slava Voloshynovskiy

Operating continuously for over three years, Solar Orbiter's STIX has observed more than 43 thousand X-ray flares. This study presents a compelling visualization of this publicly available database, using self-supervised learning to organize reconstructed flare images by their visual properties. Networks designed for self-supervised learning, such as Masked Siamese Networks or Autoencoders, are able to learn latent space embeddings which encode core characteristics of the data. We investigate the effectiveness of various pre-trained vision models, fine-tuning strategies, and image preparation. This visual representation offers a valuable starting point for identifying interesting events and grouping flares based on shared morphological characteristics, useful for conducting statistical studies or finding unique flares in this rich set of observations.

How to cite: Drozdova, M., Kinakh, V., Ramunno, F., Lastufka, E., and Voloshynovskiy, S.: Visualizing three years of STIX X-ray flare observations using self-supervised learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18534, https://doi.org/10.5194/egusphere-egu24-18534, 2024.

EGU24-19248 | Posters on site | ESSI1.5

Segmentation and Tracking of Solar Eruptive Phenomena with Convolutional Neural Networks (CNN) 

Oleg Stepanyuk and Kamen Kozarev

Solar eruptive events are complex phenomena, which most often include coronal mass ejections (CME), flares, compressive/shock waves, and filament eruptions. CMEs are large eruptions of magnetized plasma from the Sun’s outer atmosphere or corona, that propagate outward into the interplanetary space. Solar Energetic Particles (SEP) are produced through particle acceleration in flares or CME-driven shocks. Exact mechanisms behind SEP production are yet to be understood, but it is thought that most of their acceleration occurs in shocks starting in the low corona. Over the last several decades a large amount of remote solar eruption observations have become available from ground-based and space-borne instruments. This has required the development of software approaches for automated characterization of eruptive features. Most solar feature detection and tracking algorithms currently in use have restricted applicability and complicated processing chains, while the complexities in engineering machine learning (ML) training sets limit the use of data-driven approaches for tracking or solar eruptive related phenomena. Recently, we introduced a hybrid algorithmic—data driven approach for characterization and tracking of solar eruptive features with the improved wavelet-based, multi-instrument Wavetrack package (Stepanyuk et.al, J. Space Weather Space Clim. (2024)), which was used to produce training datasets for data driven image segmentation with convolutional neural networks (CNN). Its perfomance was shown on a limited set of SDO AIA 193A instrument data perfoming segmentation of EUV and shock waves. Here we extend this approach and present an ensemble of more general CNN models for data-driven segmentation of various eruptive phenomena for the set of ground-based and remote instruments data. We discuss our approach to engineering training sets and data augmentation, CNN topology and training techniques. 

How to cite: Stepanyuk, O. and Kozarev, K.: Segmentation and Tracking of Solar Eruptive Phenomena with Convolutional Neural Networks (CNN), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19248, https://doi.org/10.5194/egusphere-egu24-19248, 2024.

EGU24-19558 | Orals | ESSI1.5

Comparative Analysis of Data Preprocessing Methods for Precise Orbit Determination 

Tom Andert, Benedikt Aigner, Fabian Dallinger, Benjamin Haser, Martin Pätzold, and Matthias Hahn

In Precise Orbit Determination (POD), employing proper methods for pre-processing tracking data is crucial not only to mitigate data noise but also to identify potential unmodeled effects that may elude the prediction model of the POD algorithm. Unaccounted effects can skew parameter estimation, causing certain parameters to assimilate the unmodeled effects and deviate from their true values. Therefore, enhancing the pre-processing of tracking data ultimately contributes to refining the prediction model.

The Rosetta spacecraft, during its two-year mission alongside comet 67P/Churyumov-Gerasimenko, collected a substantial dataset of tracking data. In addition to this data, also tracking data from the Mars Express spacecraft, orbiting Mars since 2004, will serve as a use case to assess and compare diverse data pre-processing methods. Both traditional and AI-based techniques are explored to examine the impact of various strategies on the accuracy of orbit determination. This aims to enhance POD, thereby yielding a more robust scientific outcome.

How to cite: Andert, T., Aigner, B., Dallinger, F., Haser, B., Pätzold, M., and Hahn, M.: Comparative Analysis of Data Preprocessing Methods for Precise Orbit Determination, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19558, https://doi.org/10.5194/egusphere-egu24-19558, 2024.

EGU24-21463 | ECS | Posters on site | ESSI1.5

A machine learning approach to meteor light curve analysis 

Lucas Mandl, Apostolous Christou, and Andreas Windisch

In this work we conduct a thorough examination of utilizing machine learning and computer
vision techniques for classifying meteors based on their characteristics. The focus of the re-
search is the analysis of light curves emitted by meteors as they pass through the Earth’s atmo-
sphere, including aspects such as luminosity, duration, and shape. Through extracting features
from these light curves and comparing them to established meteors orbits, valuable informa-
tion about the meteor’s origin and chemical composition is sought to be obtained. A significant
contribution of the thesis is the development of methods for classifying meteors by extracting
features from the light curve shape through the usage of unsupervised classification algorithms.
This approach allows for the automatic classification of meteors into various groups based on
their properties. Data for the research is collected by a three-camera setup at the Armagh observatory,
comprising one medium-angle camera and
two wide-angle cameras. This setup enables the capturing of detailed images of meteor light
curves, as well as various other observations such as coordinate and angular data. The research
also involves the use of machine learning algorithms for data reduction and classification tasks.
By applying these techniques to the data collected from the camera setup, the identification of
parent objects based on chemical composition and meteor path is facilitated, along with the
acquisition of other valuable information about the meteors.

How to cite: Mandl, L., Christou, A., and Windisch, A.: A machine learning approach to meteor light curve analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21463, https://doi.org/10.5194/egusphere-egu24-21463, 2024.

EGU24-566 | ECS | Orals | HS3.4

A Transformer-Based Data-Driven Model for Real-Time Spatio-Temporal Flood Prediction 

Matteo Pianforini, Susanna Dazzi, Andrea Pilzer, and Renato Vacondio

Among the non-structural strategies for mitigating the huge economic losses and casualties caused by floods, the implementation of early-warning systems based on real-time forecasting of flood maps is one of the most effective. The high computational cost associated with two-dimensional (2D) hydrodynamic models, however, prevents their practical application in this context. To overcome this drawback, “data-driven” models are gaining considerable popularity due to their high computational efficiency for predictions. In this work, we introduce a novel surrogate model based on the Transformer architecture, named FloodSformer (FS), that efficiently predicts the temporal evolution of inundation maps, with the aim of providing real-time flood forecasts. The FS model combines an encoder-decoder (2D Convolutional Neural Network) with a Transformer block that handles temporal information. This architecture extracts the spatiotemporal information from a sequence of consecutive water depth maps and predicts the water depth map at one subsequent instant. An autoregressive procedure, based on the trained surrogate model, is employed to forecast tens of future maps.

As a case study, we investigated the hypothetical inundation due to the collapse of the flood-control dam on the Parma River (Italy). Due to the absence of real inundation maps, the training/testing dataset for the FS model was generated from numerical simulations performed through a 2D shallow‐water code (PARFLOOD). Results show that the FS model is able to recursively forecast the next 90 water depth maps (corresponding to 3 hours for this case study, in which maps are sampled at 2-minute intervals) in less than 1 minute. This is achieved while maintaining an accuracy deemed entirely acceptable for real-time applications: the average Root Mean Square Error (RMSE) is about 10 cm, and the differences between ground-truth and predicted maps are generally lower than 25 cm in the floodable area for the first 60 predicted frames. In conclusion, the short computational time and the good accuracy ensured by the autoregressive procedure make the FS model suitable for early-warning systems.

How to cite: Pianforini, M., Dazzi, S., Pilzer, A., and Vacondio, R.: A Transformer-Based Data-Driven Model for Real-Time Spatio-Temporal Flood Prediction, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-566, https://doi.org/10.5194/egusphere-egu24-566, 2024.

EGU24-571 | ECS | Posters on site | HS3.4

Hydrological Significance of Input Sequence Lengths in LSTM-Based Streamflow Prediction 

Farzad Hosseini Hossein Abadi, Cristina Prieto Sierra, Grey Nearing, Cesar Alvarez Diaz, and Martin Gauch

Abstract

Hydrological modeling of flashy catchments, susceptible to floods, represents a significant practical challenge.  Recent application of deep learning, specifically Long Short-Term Memory networks (LSTMs), have demonstrated notable capability in delivering accurate hydrological predictions at daily and hourly time intervals (Gauch et al., 2021; Kratzert et al., 2018).

In this study, we leverage a multi-timescale LSTM (MTS-LSTM (Gauch et al., 2021)) model to predict hydrographs in flashy catchments at hourly time scales. Our primary focus is to investigate the influence of model hyperparameters on the performance of regional streamflow models. We present methodological advancements using a practical application to predict streamflow in 40 catchments within the Basque Country (North of Spain).

Our findings show that 1) hourly and daily streamflow predictions exhibit high accuracy, with Nash-Sutcliffe Efficiency (NSE) reaching values as high as 0.941 and 0.966 for daily and hourly data, respectively; and 2) hyperparameters associated with the length of the input sequence exert a substantial influence on the performance of a regional model. Consistently optimal regional values, following a systematic hyperparameter tuning, were identified as 3 years for daily data and 12 weeks for hourly data. Principal component analysis (PCA) shows that the first principal component explains 12.36% of the variance among the 12 hyperparameters. Within this set of hyperparameters, the input sequence lengths for hourly data exhibit the highest load in PC1, with a value of -0.523; the load of the input sequence length for daily data is also very high (-0.36). This suggests that these hyperparameters strongly contribute to the model performance.

Furthermore, when utilizing a catchment-scale magnifier to determine optimal hyperparameter settings for each catchment, distinctive sequence lengths emerge for individual basins. This underscores the necessity of customizing input sequence lengths based on the “uniqueness of the place” (Beven, 2020), suggesting that each catchment may demand specific hydrologically meaningful daily and hourly input sequence lengths tailored to its unique characteristics. In essence, the true input sequence length of a catchment may encapsulate hydrological information pertaining to water transit over short and long-term periods within the basin. Notably, the regional daily sequence length aligns with the highest local daily sequence values across all catchments.

In summary, our investigation stresses the critical role of the input sequence length as a hyperparameter in LSTM networks. More broadly, this work is a step towards a better understanding and achieving accurate hourly predictions using deep learning models.

 

Keywords

Hydrological modeling; Streamflow Prediction; LSTM networks; Hyperparameters configurations; Input sequence lengths

 

References:

Beven, K. (2020). Deep learning, hydrological processes and the uniqueness of place. Hydrological Processes, 34(16), 3608–3613. doi:10.1002/hyp.13805

Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Lin, J., and Hochreiter, S. (2021). Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network, Hydrol. Earth Syst. Sci., 25, 2045–2062, DOI:10.5194/hess-25-2045-2021.

Kratzert, F., Klotz, D., Brenner, C., Schulz, K., & Herrnegger, M. (2018). Rainfall--runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrology and Earth System Sciences, 22(11), 6005–6022. DOI:10.5194/hess-22-6005-2018.

 

How to cite: Hosseini Hossein Abadi, F., Prieto Sierra, C., Nearing, G., Alvarez Diaz, C., and Gauch, M.: Hydrological Significance of Input Sequence Lengths in LSTM-Based Streamflow Prediction, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-571, https://doi.org/10.5194/egusphere-egu24-571, 2024.

EGU24-811 | ECS | Orals | HS3.4

A deep learning approach for spatio-temporal prediction of stable water isotopes in soil moisture 

Hyekyeng Jung, Chris Soulsby, and Dörthe Tetzlaff

Water flows and related mixing dynamics in the unsaturated zone are difficult to measure directly, so stable water isotope tracers have been used successfully to quantify flux and storage dynamics and to constrain process-based hydrological models as proxy data. In this study, a data-driven model based on deep learning was adapted to interpolate and extrapolate spatio-temporal isotope signals of δ18O and δ2H in soil water in three dimensions. Further, this was also used to help quantify evapotranspiration and groundwater recharge processes in the unsaturated zone. To consider both spatial and temporal dependencies of water isotope signals in the model design, the output space was decomposed into temporal basis functions and spatial coefficients using singular value decomposition. Then, temporal functions and spatial coefficients were predicted separately by specialized deep learning models in interdependencies among target data, such as the LSTM model and convolutional neural network. Finally, the predictions by the models were integrated and analyzed post-hoc using XAI tools.

Such an integrated framework has the potential to improve understanding of model behavior based on features (e.g., climate, hydrological component) connected to either temporal or spatial information. Furthermore, the model can serve as a surrogate model for process-based hydrological models, improving the use of process-based models as learning tools.

How to cite: Jung, H., Soulsby, C., and Tetzlaff, D.: A deep learning approach for spatio-temporal prediction of stable water isotopes in soil moisture, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-811, https://doi.org/10.5194/egusphere-egu24-811, 2024.

EGU24-2872 | ECS | Posters on site | HS3.4

Runoff coefficient modelling using Long Short-Term Memory (LSTM) in the Rur catchment, Germany 

Arash Rahi, Mehdi Rahmati, Jacopo Dari, and Renato Morbidelli

This research examines the effectiveness of Long Short-Term Memory (LSTM) models in predicting runoff coefficient (Rc) within the Rur basin at the Stah outlet (Germany) during the period from 1961 to 2021; monthly data of temperature (T), precipitation (P), soil water storage (SWS), and total evaporation (ETA) are used as an input. Because of the complexity in predicting undecomposed Rc time series due to noise, a novel approach incorporating discrete wavelet transform (DWT) to decompose the original Rc at five levels is proposed.

The investigation identifies overfitting challenges at level-1, gradually mitigated in subsequent decomposition levels, particularly in level-2, while other levels remain tuned. Reconstructing Rc using modelled decomposition coefficients yielded Nash-Sutcliffe efficacy (NSE) values of 0.88, 0.79, and 0.74 for the training, validation, and test sets, respectively. Comparative analysis highlights that modelling undecomposed Rc with LSTM yields to a minor accuracy, emphasizing the pivotal role of decomposition techniques in tandem with LSTM for enhanced model performances.

This study provides novel insights to address challenges related to noise effects and temporal dependencies in Rc modelling; through a comprehensive analysis of the interplay between atmospheric conditions and observed data, the research contributes in advancing predictive modelling in hydrology.

How to cite: Rahi, A., Rahmati, M., Dari, J., and Morbidelli, R.: Runoff coefficient modelling using Long Short-Term Memory (LSTM) in the Rur catchment, Germany, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2872, https://doi.org/10.5194/egusphere-egu24-2872, 2024.

EGU24-2939 | ECS | Orals | HS3.4

Probabilistic streamflow forecasting using generative deep learning models 

Mohammad Sina Jahangir and John Quilty

The significance of probabilistic hydrological forecasting has grown in recent years, offering crucial insights for risk-based decision-making and effective flood management. This study explores generative deep learning models, specifically the conditional variational autoencoder (CVAE), for probabilistic streamflow forecasting. This innovative approach is applied for forecasting streamflow one to seven days (s) ahead in 75 Canadian basins included in the open-source Canadian model parameter experiment (CANOPEX) database. CVAE is compared against two benchmark quantile-based deep learning models: the quantile-based encoder-decoder (ED) and the quantile-based CVAE (QCVAE).

Over 9000 deep learning models are developed based on different input variables, basin characteristics, and model structures and evaluated regarding point forecast accuracy and forecast reliability. Results highlight CVAE‘s superior reliability, showing a median reliability of 92.49% compared to 87.35% for ED and 84.59% for QCVAE (considering a desired 90% confidence level). However, quantile-based forecast models exhibit marginally better point forecasts, as evidenced by Kling-Gupta efficiency (KGE), with a median KGE of 0.90 for ED and QCVAE (compared to 0.88 for CVAE). Notably, the CVAE model provides reliable probabilistic forecasts in basins with low point forecast accuracy.

The developed generative deep learning models can be used as a benchmark for probabilistic streamflow forecasting due to the use of the open-source CANOPEX dataset. Overall, the results of this study contribute to the expanding field of generative deep learning models in hydrological forecasting, offering a general framework that applies to forecasting other hydrological variables as well (precipitation and soil moisture).

How to cite: Jahangir, M. S. and Quilty, J.: Probabilistic streamflow forecasting using generative deep learning models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2939, https://doi.org/10.5194/egusphere-egu24-2939, 2024.

EGU24-6432 | ECS | Orals | HS3.4

A Step Towards Global Hydrologic Modelling: Accurate Streamflow Predictions In Pseudo-Ungauged Basins of Japan 

Hemant Servia, Frauke Albrecht, Samuel Saxe, Nicolas Bierti, Masatoshi Kawasaki, and Shun Kurihara

In addressing the challenge of streamflow prediction in ungauged basins, this study leveraged deep learning (DL) models, especially long short-term memory (LSTM) networks, to predict streamflow for pseudo ungauged basins in Japan. The motivation stems from the recognized limitations of traditional hydrological models in transferring their performance beyond the calibrated basins. Recent research suggests that DL models, especially those trained on multiple catchments, demonstrate improved predictive capabilities utilizing the concept of streamflow regionalization. However, the majority of these studies were confined to geographic regions within the United States.

For this study, a total number of 211 catchments were delineated and investigated, distributed across all four primary islands of Japan (Kyushu - 32, Shikoku - 13, Honshu - 127, and Hokkaido - 39) encompassing a comprehensive sample of hydrological systems within the region. The catchments were obtained corresponding to the streamflow observation points and their combined area represented more than 43% of Japan's total land area, after accounting for overlaps. After cleaning and refining the streamflow dataset, the remaining catchments (198) were divided into training (~70%), validation (~20%), and holdout test (~10%) sets. A combination of dynamic (time-varying) and static (constant) variables were obtained on a daily basis corresponding to the daily streamflow data and provided to the models as input features. However, the final model accorded higher significance to dynamic features in comparison to the static ones. Although the models were trained on daily time steps, the results were aggregated to monthly timescale. The main evaluation metrics included the Nash-Sutcliffe Efficiency (NSE) and Pearson’s correlation coefficient (r). The final model achieved a median NSE of 0.96, 0.83, & 0.78, and a median correlation of 0.98, 0.92, & 0.91 corresponding to the training, validation, and test catchments, respectively. For the validation catchments, 90% exhibited NSE values greater than 0.50, and 97% demonstrated a correlation surpassing 0.70. Correspondingly, these proportions were observed at 77% and 91% for the test catchments.

The results presented in this study demonstrate the feasibility and efficacy of developing a data-driven model for streamflow prediction in ungauged basins utilizing streamflow regionalization. The final model exhibits commendable performance, as evidenced by high NSE and correlation coefficients across the majority of the catchments. Importantly, the model's ability to generalize to unseen data is highlighted by its remarkable performance on the holdout test set, with only a few instances of lower NSE values (< 0.50) and correlation coefficients (< 0.70).

How to cite: Servia, H., Albrecht, F., Saxe, S., Bierti, N., Kawasaki, M., and Kurihara, S.: A Step Towards Global Hydrologic Modelling: Accurate Streamflow Predictions In Pseudo-Ungauged Basins of Japan, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6432, https://doi.org/10.5194/egusphere-egu24-6432, 2024.

EGU24-6846 | ECS | Orals | HS3.4

Towards Fully Distributed Rainfall-Runoff Modelling with Graph Neural Networks 

Peter Nelemans, Roberto Bentivoglio, Joost Buitink, Ali Meshgi, Markus Hrachowitz, Ruben Dahm, and Riccardo Taormina

Fully distributed hydrological models take into account the spatial variability of a catchment, allowing for a more accurate representation of its heterogeneity, and assessing its hydrological response at multiple locations. However, physics-based fully distributed models can be time-consuming when it comes to model runtime and calibration, especially for large-scale catchments. On the other hand, deep learning models have shown great potential in the field of hydrological modelling, outperforming lumped rainfall-runoff conceptual models, and improving prediction in ungauged basins via catchment transferability. Despite these advances, the field still lacks a multivariable, fully distributed hydrological deep learning model capable of generalizing to unseen catchments. To address the aforementioned challenges associated with physics-based distributed models and deep learning models, we explore the possibility of developing a fully distributed deep learning model by using Graph Neural Networks (GNN), an extension of deep learning methods to non-Euclidean topologies including graphs and meshes.

We develop a surrogate model of wflow_sbm, a fully distributed, physics-based hydrological model, by exploiting the similarities between its underlying functioning and GNNs. The GNN uses the same input as wflow_sbm: distributed static parameters based on physical characteristics of the catchment and gridded dynamic forcings. The GNN is trained to produce the same output as wflow_sbm, predicting multiple gridded variables related to rainfall-runoff, such as streamflow, actual evapotranspiration, subsurface flow, saturated and unsaturated groundwater storage, snow storage, and runoff. We show that our GNN model achieves high performance in unseen catchments, indicating that GNNs are a viable option for fully distributed multivariable hydrological models capable of generalizing to unseen regions. Furthermore, the GNN model achieves a significant computational speedup compared to wflow_sbm. We will continue this research, using the GNN-based surrogate models as pre-trained backbones to be fine-tuned with measured data, ensuring accurate model adaptation, and enhancing their practical applicability in diverse hydrological scenarios.

How to cite: Nelemans, P., Bentivoglio, R., Buitink, J., Meshgi, A., Hrachowitz, M., Dahm, R., and Taormina, R.: Towards Fully Distributed Rainfall-Runoff Modelling with Graph Neural Networks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6846, https://doi.org/10.5194/egusphere-egu24-6846, 2024.

This research created a deep neural network (DNN)-based hydrologic model for an urban watershed in South Korea using multiple LSTM (long short-term memory) units and a fully connected layer. The model utilized 10-minute intervals of radar-gauge composite precipitation and temperature data across 239 grid cells, each 1 km in resolution, to simulate watershed flow discharge every 10 minutes. It showed high accuracy during both the calibration (2013–2016) and validation (2017–2019) periods, with Nash–Sutcliffe efficiency coefficient values of 0.99 and 0.67, respectively. Key findings include: 1) the DNN model's runoff–precipitation ratio map closely matched the imperviousness ratio map from land cover data, demonstrating the model's ability to learn precipitation partitioning without prior hydrological information; 2) it effectively mimicked soil moisture-dependent runoff processes, crucial for continuous hydrologic models; and 3) the LSTM units displayed varying temporal responses to precipitation, with units near the watershed outlet responding faster, indicating the model's capability to differentiate between hydrological components like direct runoff and groundwater-driven baseflow.

How to cite: Kim, D.: Exploring How Machines Model Water Flow: Predicting Small-Scale Watershed Behavior in a Distributed Setting, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7186, https://doi.org/10.5194/egusphere-egu24-7186, 2024.

EGU24-8102 | ECS | Orals | HS3.4

Improving the Generalizability of Urban Pluvial Flood Emulators by Contextualizing High-Resolution Patches 

Tabea Cache, Milton S. Gomez, Jovan Blagojević, Tom Beucler, João P. Leitão, and Nadav Peleg

Predicting future flood hazards in a changing climate requires adopting a stochastic framework due to the multiple sources of uncertainties (e.g., from climate change scenarios, climate models, or natural variability). This requires performing multiple flood inundation simulations which are computationally costly. Data-driven models can help overcome this issue as they can emulate urban flood maps considerably faster than traditional flood simulation models. However, their lack of generalizability to both terrain and rainfall events still limits their application. Additionally, these models face the challenge of not having sufficient training data. This led state-of-the-art models to adopt a patch-based framework, where the study area is first divided into local patches (i.e., broken into smaller terrain images) that are subsequently merged to reconstruct the whole study area prediction. The main drawback of this method is that the model is blind to the surroundings of the local patch. To overcome this bottleneck, we developed a new deep learning model that includes patches' contextual information while keeping high-resolution information of the local patch. We trained and tested the model in the city of Zurich, at spatial resolution of 1 m. The evaluation focused on 1-hour rainfall events at 5 min temporal resolution and encompassing extreme precipitation return periods from 2- to 100-year. The results show that the proposed CNN-attention model outperforms the state-of-the-art patch-based urban flood emulator. First, our model can faithfully represent flood depths for a wide range of extreme rainfall events (peak rainfall intensities ranging from 42.5 mm h-1 to 161.4 mm h-1). Second, the model's terrain generalizability was assessed in distinct urban settings, namely Luzern and Singapore. Our model accurately identifies water accumulation locations, which constitutes an improvement compared to current models. Using transfer learning, the model was successfully retrained in the new cities, requiring only a single rainfall event to adapt the model to new terrains while preserving adaptability across diverse rainfall conditions. Our results suggest that by integrating contextual terrain information with local terrain patches, our proposed model effectively generates high-resolution urban pluvial flood maps, demonstrating applicability across varied terrains and rainfall events.

How to cite: Cache, T., Gomez, M. S., Blagojević, J., Beucler, T., Leitão, J. P., and Peleg, N.: Improving the Generalizability of Urban Pluvial Flood Emulators by Contextualizing High-Resolution Patches, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8102, https://doi.org/10.5194/egusphere-egu24-8102, 2024.

EGU24-9190 | ECS | Orals | HS3.4

Learning Catchment Features with Autoencoders 

Alberto Bassi, Antonietta Mira, Marvin Höge, Fabrizio Fenicia, and Carlo Albert

By employing Machine Learning techniques on the US-CAMELS dataset, we discern a minimal number of streamflow features. Together with meteorological forcing, these features enable an approximate reconstruction of the entire streamflow time-series. This task is achieved through the application of an explicit noise conditional autoencoder, wherein the meteorological forcing is inputted to the decoder to encourage the encoder to learn streamflow features exclusively related to landscape properties. The optimal number of encoded features is determined with an intrinsic dimension estimator. The accuracy of reconstruction is then compared with models that take a subset of static catchment attributes (both climate and landscape attributes) in addition to meteorological forcing variables. Our findings suggest that attributes gathered by experts encompass nearly all pertinent information regarding the input/output relationship. This information can be succinctly summarized with merely three independent streamflow features. These features exhibit a strong correlation with the baseflow index and aridity indicators, aligning with the observation that predicting streamflow in dry catchments or with a high baseflow index is more challenging. Furthermore, correlation analysis underscores the significance of soil-related and vegetation attributes. These learned features can also be associated with parameters in conceptual hydrological models such as the GR model family.

How to cite: Bassi, A., Mira, A., Höge, M., Fenicia, F., and Albert, C.: Learning Catchment Features with Autoencoders, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9190, https://doi.org/10.5194/egusphere-egu24-9190, 2024.

EGU24-9446 | ECS | Orals | HS3.4

Skilful prediction of mid-term sea surface temperature using 3D self-attention-based neural network 

Longhao Wang, Yongqiang Zhang, and Xuanze Zhang

Sea surface temperature (SST) is a critical parameter in the global ocean-atmospheric system, exerting a substantial impact on climate change and extreme weather events like droughts and floods. The precise forecasting of future SSTs is thus vital for identifying such weather anomalies. Here we present a novel three-dimensional (3D) neural network model based on self-attention mechanisms and Swin-Transformer for mid-term SST predictions. This model, integrating both climatic and temporal features, employs self-attention to proficiently capture the temporal dynamics and global patterns in SST. This approach significantly enhances the model's capability to detect and analyze spatiotemporal changes, offering a more nuanced understanding of SST variations. Trained on 59 years of global monthly ERA5-Land reanalysis data, our model demonstrates strong deterministic forecast capabilities in the test period. It employs a convolution strategy and global attention mechanism, resulting in faster and more accurate training compared to traditional methods, such as Convolutional Neural Network with Long short-term memory (CNN-LSTM). The effectiveness of this SST prediction model highlights its potential for extensive multidimensional modelling applications in geosciences.

How to cite: Wang, L., Zhang, Y., and Zhang, X.: Skilful prediction of mid-term sea surface temperature using 3D self-attention-based neural network, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9446, https://doi.org/10.5194/egusphere-egu24-9446, 2024.

Traditional hydrological models have long served as the standard for predicting streamflow across temporal and spatial domains. However, a persistent challenge in modelling lies in mitigating bias inherent in streamflow estimation due to both random and systemic errors in the employed model. Removal of this bias is pivotal for effective water resources management and resilience against extreme events, especially amidst evolving climate conditions. An innovative solution to address this challenge involves the integration of hydrological models with deep learning methods, known as hybridisation. Long Short-Term Memory networks (LSTM), have emerged as a promising and efficient approach to enhancing streamflow estimation. This study focuses on coupling LSTM with a physically distributed model, Wflow_sbm, to serve as a post-processor aimed at reducing modelling errors. The coupled Wflow_sbm-LSTM model was applied to the Boyne catchment in Ireland, utilising a dataset spanning two decades, divided into training, validation, and testing sets to ensure robust model evaluation. Predictive performance was rigorously assessed using metrics like Modified Kling-Gupta Efficiency (MKGE) and Nash-Sutcliffe Efficiency (NSE), with observed streamflow discharges as the target variable. Results demonstrated that the coupled model outperformed the best-calibrated Wflow_sbm model in the study catchment based on the performance measures. The enhanced prediction of extreme events by the coupled Wflow_sbm-LSTM model strengthens the case for its integration into an operational river flow forecasting framework. Significantly, Wflow is endorsed by the National Flood Forecast Warning Service (NFFWS) in Ireland as a recommended model for streamflow simulations, specifically designed for fluvial flood forecasting. Consequently, our proposed Wflow_sbm-LSTM coupled model presents a compelling opportunity for integration into the NFFWS. With demonstrated potential to achieve precise streamflow estimations, this integration holds promise for significantly enhancing the accuracy and effectiveness of flood predictions in Ireland.

How to cite: Mohammed, S. and Nasr, A.: Advancing Streamflow Modelling: Bias Removal in Physically-Based Models with the Long Short-Term Memory Networks (LSTM) Algorithm, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9573, https://doi.org/10.5194/egusphere-egu24-9573, 2024.

EGU24-10506 | Posters on site | HS3.4

Enhancing Hydrological Predictions: Feature-Driven Streamflow Forecasting with Sparse Autoencoder-based Long Short-Term Memory Networks 

Neha Vinod, Arathy Nair Geetha Raveendran, Adarsh Sankaran, and Anandu Kochukattil Ajith

In response to the critical demand for accurate streamflow predictions in hydrology, this study introduces a Sparse Autoencoder-based Long Short-Term Memory (SA-LSTM) framework applied to daily streamflow data from three-gauge stations within the Greater Pamba River Basin of Kerala, India, which was the worst affected region by the devastating floods of 2018. The SA-LSTM model addresses the challenge of feature selection from an extensive set of corresponding 1 to 7 days lagged climatic variables, such as precipitation, maximum and minimum temperatures, by incorporating a sparsity constraint. This constraint strategically guides the autoencoder to focus on the most influential features for the prediction analysis. The prediction process involves training the SA-LSTM model on historical streamflow data and climatic variables, allowing the model to learn intricate patterns and relationships. Furthermore, this study includes a comparative analysis featuring the Random Forest (RF)-LSTM model, where the RF model is employed for feature extraction, and a separate LSTM model is used for streamflow prediction. While the RF-LSTM combination demonstrates competitive performance, it is noteworthy that the SA-LSTM model consistently outperforms in terms of predictive accuracy. Rigorous evaluation metrics, including Correlation Coefficient (R2), Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Error (MAE), highlight the SA-LSTM's forecasting accuracy across the three stations. Notably, the R2 values surpass 0.85, RMSE values remain under 12 cubic meters per second (m³/s), MSE values are below 70 (m³/s), and MAE values approach 8 m³/s. The detailed comparison between the above models underscores the superior capabilities of the SA-LSTM framework in capturing complex temporal patterns, emphasizing its potential for advancing hydrological modeling and flood risk management in flood-prone regions.

 

Key words : Streamflow, LSTM, Sparse Autoencoder, Flood, Greater Pamba

How to cite: Vinod, N., Geetha Raveendran, A. N., Sankaran, A., and Kochukattil Ajith, A.: Enhancing Hydrological Predictions: Feature-Driven Streamflow Forecasting with Sparse Autoencoder-based Long Short-Term Memory Networks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10506, https://doi.org/10.5194/egusphere-egu24-10506, 2024.

EGU24-11506 | ECS | Posters on site | HS3.4

Forecasting reservoir inflows with Long Short-Term Memory models 

Laura Soncin, Claudia Bertini, Schalk Jan van Andel, Elena Ridolfi, Francesco Napolitano, Fabio Russo, and Celia Ramos Sánchez

The increased variability of water resources and the escalating water consumption contribute to the risk of stress and water scarcity in reservoirs that are typically designed based on historical conditions. Therefore, it is relevant to provide accurate forecasts of reservoir inflow to optimize sustainable water management as conditions change, especially during extreme events, such as flooding and drought. However, accurate forecasting the inflow is not straightforward, due the uncertainty of the hydrological inputs and the strong non-linearity of the system. Numerous recent studies have employed approaches based on Machine Learning (ML) techniques, such as Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Random Forest (RF), with successful examples of providing skilful site-specific predictions. In particular, LSTM have emerged among the pool of ML models for their performance in simulating rainfall-runoff processes, thanks to their ability to learn long-term dependencies from time series. 
Here we propose an LSTM-based approach for inflow prediction in the Barrios de Luna reservoir, located in the Spanish part of the Douro River Basin. The reservoir has a dual role, as its water is used for irrigation during dry summer periods, and its storage volume is used to mitigate floods. Therefore, in order to operate the reservoir in the short-term, Barrios de Luna reservoir operators need accurate forecast to support water management decisions in the daily and weekly time horizons. In our work, we explore the potential of a LSTM model to predict inflow in the reservoir at varying lead times, ranging from 1 day up to 4 weeks. Initially, we use as inputs past inflow, precipitation and temperature observations, and then we include meteorological forecasts of precipitation and temperature from ECMWF Extended Range. For the latter experiments, different configurations of the LSTM are tested, i.e. training the model with observations and forecasts together and training the model with observations only and fine tune it with forecasts.
Our preliminary results show that precipitation, temperature and inflow observations are all crucial inputs to the LSTM for predicting inflow, and meteorological forecast inputs seem to improve performance for the longer lead-times of one week up to a month.
Predictions developed will contribute to the Douro case study of the CLImate INTelligence (CLINT) H2020 project.

How to cite: Soncin, L., Bertini, C., van Andel, S. J., Ridolfi, E., Napolitano, F., Russo, F., and Ramos Sánchez, C.: Forecasting reservoir inflows with Long Short-Term Memory models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11506, https://doi.org/10.5194/egusphere-egu24-11506, 2024.

EGU24-11768 | ECS | Posters on site | HS3.4

High-Efficiency Rainfall Data Compression Using Binarized Convolutional Autoencoder 

Manuel Traub, Fedor Scholz, Thomas Scholten, Christiane Zarfl, and Martin V. Butz

In the era of big data, managing and storing large-scale meteorological datasets is a critical challenge. We focus on high-resolution rainfall data, which is crucial to atmospheric sciences, climate research, and real-time weather forecasting. This study introduces a deep learning-based approach to compress the German Radar-Online-Aneichung (RADOLAN) rainfall dataset. We achieve a compression ratio of 200:1 while maintaining a minimal mean squared reconstruction error (MSE). Our method combines a convolutional autoencoder with a novel binarization mechanism, to compress data from a resolution of 900x900 pixels at 32-bit depth to 180x180 pixels at 4-bit depth. Leveraging the ConvNeXt architecture (Zhuang Liu, et al., 'A ConvNet for the 2020s'), our method learns a convolutional autoencoder for enhanced meteorological data compression. ConvNeXt introduces key architectural modifications, such as revised layer normalization and expanded receptive fields, taking inspiration from Vision Transformer to form a modern ConvNet. Our novel binarization mechanism, pivotal for achieving the high compression ratio, operates by dynamically quantizing the latent space representations using a novel magnitude specific noise injection technique. This quantization not only reduces the data size but also preserves crucial meteorological information as our low reconstruction MSE demonstrates. Beyond rainfall data, our approach shows promise for other types of high-resolution meteorological datasets, such as temperature, humidity, etc. Adapting our method to these modalities could further streamline the data management processes in meteorological deep learning scenarios and thus facilitate efficient storage and processing of diverse meteorological datasets.

How to cite: Traub, M., Scholz, F., Scholten, T., Zarfl, C., and Butz, M. V.: High-Efficiency Rainfall Data Compression Using Binarized Convolutional Autoencoder, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11768, https://doi.org/10.5194/egusphere-egu24-11768, 2024.

Machine learning has extensively been applied to for flow forecasting in gauged basins. Increasingly, models generating forecasts in some basin(s) of interest are trained using data from beyond the study region. With increasingly large hydrological datasets, a new challenge emerges: given some region of interest, how do you select which basins to include among the training dataset?

There is currently little guidance on selecting data from outside the basin(s) under study. An intuitive approach might be to select data from neighbouring basins, or basins with similar hydrological characteristics. However, a growing body of research suggests that including hydrologically dissimilar basins can in fact produce greater improvements to model generalisation. In this study, we use clustering as a simple yet effective method for identifying temporal and spatial hydrological diversity within a large hydrological dataset. The clustering results are used to generate information-rich subsets of data, that are used for model training. We compare the effects that basin subsets, that represent various hydrological characteristics, have on model generalisation.
Our study shows that data within individual basins, and between hydrologically similar basins, contain high degrees of redundancy. In such cases, training data can be heavily undersampled with no adverse effects – or even moderate improvements to model performance. We also show that spatial hydrological diversity can hugely benefit model training, providing improved generalisation and a regularisation effect.

How to cite: Snieder, E. and Khan, U.: Towards improved spatio-temporal selection of training data for LSTM-based flow forecasting models in Canadian basins, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12293, https://doi.org/10.5194/egusphere-egu24-12293, 2024.

EGU24-13353 | Posters on site | HS3.4

Flood Prediction Using Deep Neural Networks Across a Large and Complex River System  

Mostafa Saberian and Vidya Samadi

Accurately predicting streamflow poses a considerable challenge particularly for intense storm events occurring across complex river systems. To tackle this issue, we developed multiple deep neural network models including a Long Short-term Memory (LSTM) and Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS) to predict short duration (1-hour) flood hydrographs. LSTM excels in preserving prolonged dependencies in structured time series data, while N-HiTS introduces an innovative deep neural architecture characterized by backward and forward residual links and a deep stack of fully connected layers. In addition, N-HiTS employs a combination of multi-rate sampling and multi-scale synthesis of predictions, resulting in a hierarchical forecasting structure that reduces computational requirements and enhances accuracy. Our goal was to evaluate the robustness and effectiveness of these advanced algorithms by comparing them with the National Water Model (NWM) forecast, across a large and complex river system i.e., the Wateree River Basin in South Carolina, USA. The models were trained and tested using precipitation, temperature, humidity, and solar radiation data during the periods 01/01/2009 to 09/30/2022 and 10/1/2022 to 01/01/2024, respectively.  Analysis suggests that N-HiTS showcased state-of-the-art performance and enhanced hourly flood forecasting accuracy by approximately 10% compared to LSTM and NWM with a negligible difference in computational costs. N-HiTS was able to more accurately forecast time to peak and peak rate values of hourly flood hydrographs compared to the LSTM and NWM. Our extensive experiments revealed the importance of multi-rate input sampling and hierarchical interpolation approaches designed within the N-HiTS model that drastically improved the flood forecasting and interpretability of the predictions.

How to cite: Saberian, M. and Samadi, V.: Flood Prediction Using Deep Neural Networks Across a Large and Complex River System , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13353, https://doi.org/10.5194/egusphere-egu24-13353, 2024.

We propose a hybrid deep learning model that combines long short-term memory networks (LSTMs) to capture both spatial and temporal dependencies in the river system. The LSTM component processes spatial information derived from topographical data and river network characteristics, allowing the model to understand the physical layout of the river basin. Simultaneously, the LSTM component exploits temporal patterns in historical dam release and rainfall data, enabling the model to discern the dynamics of flood propagation. In comparison of previous study, previous results accepted only hydrological models such as HECRAS, FLDWAV, FLUMEN. But, this study accept combination of HECRAS and Deep Learning algorithm, LSTM. The goal of this study is to predict the river highest level and travel time by dam release 3 to 6 hours in advance throughout the Seomjin river basin. In order to achieve, this study conducted hydrological modeling (HECRAS) and developed a deep learning algorithm (LSTM). Afterward, the developed model combining HECRAS and LSTM was verified at six flood alert stations. Finally, the models will provide the river highest level and travel time information up to 6 hours in advance at six flood alert stations. To train and validate the model, we compile a comprehensive dataset of historical dam release events and corresponding flood travel times from a range of river basins. The dataset includes various hydrological and meteorological features to ensure the model's robustness in handling diverse scenarios. The deep learning model is then trained using a subset of the data and validated against unseen events to assess its generalization capabilities. Preliminary results indicate that the hybrid HECRAS-LSTM model outperforms traditional hydrological models in predicting flood travel times. The model exhibits improved accuracy, particularly in cases of complex river geometries and extreme weather events. Additionally, the model demonstrates its potential for real-time forecasting, as it can efficiently process and assimilate incoming data. In conclusion, our study showcases the effectiveness of using a hybrid HECRAS-LSTM model for forecasting flood travel time by dam release. By leveraging the power of deep learning, we pave the way for more precise and reliable flood predictions, contributing to the overall resilience and safety of communities located downstream of dam-controlled river systems.

How to cite: Kang, J., Lee, G., Park, S., Jung, C., and Yu, J.: The Development of Forecasting System Flood Travel Time by Dam Release for Supplying Flood Information Using Deep Learning at Flood Alert Stations in the Seomjin River Basin, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13848, https://doi.org/10.5194/egusphere-egu24-13848, 2024.

EGU24-14765 | ECS | Posters virtual | HS3.4

Optimizing Groundwater Forecasting: Comparative Analysis of MLP Models Using Global and Regional Precipitation Data 

Akanksha Soni, Surajit Deb Barma, and Amai Mahesha

This study investigates the efficacy of Multi-Layer Perceptron (MLP) models in groundwater level modeling, specifically emphasizing the pivotal role of input data quality, particularly precipitation data. Unlike prior research that primarily focused on regional datasets like those from the India Meteorological Department (IMD), our research explores the integration of global precipitation data, specifically leveraging the Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (IMERG) dataset for MLP-based modeling. The assessment was conducted using two wells in Dakshina Kannada, evaluating four MLP models (GA-MLP, EFO-MLP, PSO-MLP, AAEO-MLP) with IMERG and IMD precipitation data. Performance metrics were employed, including mean absolute error, root mean square error, normalized Nash-Sutcliffe efficiency, and Pearson's correlation index. The study also includes convergence analysis and stability assessments, revealing the significant impact of the precipitation dataset on model performance. Noteworthy findings include the superior performance of the AAEO-MLP model in training with IMD data and the GA-MLP model's outperformance in testing at the Bajpe well with both datasets. The stability of the GA-MLP model, indicated by the lowest standard deviation values in convergence analysis, underscores its reliability. Moreover, transitioning to the IMERG dataset improved model performance and reduced variability, providing valuable insights into the strengths and limitations of MLP models in groundwater-level modeling. These results advance the precision and dependability of groundwater level forecasts, thereby supporting more effective strategies for international groundwater resource management.

How to cite: Soni, A., Barma, S. D., and Mahesha, A.: Optimizing Groundwater Forecasting: Comparative Analysis of MLP Models Using Global and Regional Precipitation Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14765, https://doi.org/10.5194/egusphere-egu24-14765, 2024.

EGU24-15248 | ECS | Orals | HS3.4

Estimation of Small Stream Water Surface Elevation Using UAV Photogrammetry and Deep Learning 

Radosław Szostak, Mirosław Zimnoch, Przemysław Wachniew, Marcin Pietroń, and Paweł Ćwiąkała

Unmanned aerial vehicle (UAV) photogrammetry allows the generation of orthophoto and digital surface model (DSM) rasters of a terrain. However, DSMs of water bodies mapped using this technique often reveal distortions in the water surface, thereby impeding the accurate sampling of water surface elevation (WSE) from DSMs. This study investigates the capability of deep neural networks to accommodate the aforementioned perturbations and effectively estimate WSE from photogrammetric rasters. Convolutional neural networks (CNNs) were employed for this purpose. Three regression approaches utilizing CNNs were explored: i) direct regression employing an encoder, ii) prediction of the weight mask using an encoder-decoder architecture, subsequently used to sample values from the photogrammetric DSM, and iii) a solution based on the fusion of the two approaches. The dataset employed in this study comprises data collected from five case studies of small lowland streams in Poland and Denmark, consisting of 322 DSM and orthophoto raster samples. Each sample corresponds to a 10 by 10 meter area of the stream channel and adjacent land. A grid search was employed to identify the optimal combination of encoder, mask generation architecture, and batch size among multiple candidates. Solutions were evaluated using two cross-validation methods: stratified k-fold cross-validation, where validation subsets maintained the same proportion of samples from all case studies, and leave-one-case-out cross-validation, where the validation dataset originates entirely from a single case study, and the training set consists of samples from other case studies. The proposed solution was compared with existing methods for measuring water levels in small streams using a drone. The results indicate that the solution outperforms previous photogrammetry-based methods and is second only to the radar-based method, which is considered the most accurate method available.

This research was funded by National Science Centre, Poland, project WATERLINE (2020/02/Y/ST10/00065), under the CHISTERA IV programme of the EU Horizon 2020 (Grant no 857925).

How to cite: Szostak, R., Zimnoch, M., Wachniew, P., Pietroń, M., and Ćwiąkała, P.: Estimation of Small Stream Water Surface Elevation Using UAV Photogrammetry and Deep Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15248, https://doi.org/10.5194/egusphere-egu24-15248, 2024.

EGU24-16234 | ECS | Posters on site | HS3.4

A bottom-up approach to identify important hydrological processes by evaluating a national scale EA-LSTM model for Denmark 

Grith Martinsen, Niels Agertoft, and Phillip Aarestrup

The utilization of data-driven models in hydrology has witnessed a significant increase in recent years. The open-source philosophy underpinning much of the code developed and research being conducted has facilitated widespread access to the the hydrological community to sophisticated machine learning models and technology (Reichstein 2019). These data driven approaches to hydrological modelling has witnessed growing interest after multiple studies has shown how machine-learning models were able to outperform nationwide traditional physics-based hydrological models (Kratzerts et al. 2019). The latter often demands substantial man-hours for development, calibration and fine-tuning to accurately represent relevant hydrological processes.

In this national-scale explorative study we undertake an in-depth examination of Danish catchment hydrology. Our objective is to understand what processes and dynamics are well captured by a purely data driven model without physical constraints, namely the Entity-Aware Long Short-Term Model (EA-LSTM). The model code was developed by Kratzerts et al. (2019) and the analysis build on top of a newly published national CAMELS data set covering 301 catchments in Denmark (Koch and Schneider, 2022), with an average resolution of 130 km2.

Denmark, spanning an area of around 43 000 km2, demonstrates a relatively high data coverage. Presently more than 400 stations record water level measurements in the Danish stream network, while a network of 243 stations have collected meteorological data since 2011. These datasets maintained by the Danish Environmental Protection Agency and the Danish Meteorological Institute, respectively, and are publicly available.

Despite Denmark’s data abundance, Koch and Schneider (2022) demonstrated that the data-driven EA-LSTM model, trained with the CAMELS dataset for Denmark (from now on referred to as the DK-LSTM) were not able to outperform the traditional physics-based hydrological model, against which it was benchmarked. Consequently, performance of the DK-LSTM model could be increased by pre-training it with simulations from a national physics-based model indicating that dominating hydrological processes are not described by the readily available input data in the CAMELS dataset.

This study conducts a comprehensive analysis of Danish catchment hydrology aiming to explore three aspects: 1) the common characteristics of the catchments where the DK-LSTM performs well or encounters challenges, 2) the identification of hydrological characteristics, that exhibit improvement when informing the data-driven model with physics-based model simulations, and 3) an exploration of whether the aforementioned findings can guide us in determining necessary physical constraints and/or input variables that explains the hydrological processes for the data-driven model approach at a national scale, using the example of DK-LSTM.

 

Koch, J., and Schneider, R. Long short-term memory networks enhance rainfall-runoff modelling at the national scale of Denmark. GEUS Bulletin49. https://doi.org/10.34194/geusb.v49.8292, 2022.

Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089-5110, https://doi.org/10.5194/hess-23-5089-2019, 2019.

Reichstein, M., Camps-Valls, G., Stevens, B. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204. https://doi.org/10.1038/s41586-019-0912-1, 2019.

How to cite: Martinsen, G., Agertoft, N., and Aarestrup, P.: A bottom-up approach to identify important hydrological processes by evaluating a national scale EA-LSTM model for Denmark, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16234, https://doi.org/10.5194/egusphere-egu24-16234, 2024.

EGU24-16474 | ECS | Orals | HS3.4

Short- and mid-term discharge forecasts combining machine learning and data assimilation for operational purpose 

Bob E Saint Fleur, Eric Gaume, Michaël Savary, Nicolas Akil, and Dominique Theriez

In recent years, machine learning models, particularly Long Short-Term Memory (LSTM), have proven to be effective alternatives for rainfall-runoff modeling, surpassing traditional hydrological modeling approaches 1. These models have predominantly been implemented and evaluated for rainfall-runoff simulations. However, operational hydrology often requires short- and mid-term forecasts. To be effective, such forecasts must consider past observed values of the predicted variables, requiring a data assimilation procedure 2,3,4. This presentation will evaluate several approaches based on the combination of open-source machine learning tools and data assimilation strategies for short- and mid-term discharge forecasting of flood and/or drought events. The evaluation is based on the rich and well-documented CAMELS dataset 5,6,7. The tested approaches include: (1) coupling pre-trained LSTMs on the CAMELS database with a Multilayer Perceptron (MLP) for prediction error corrections, (2) direct discharge MLP forecasting models specific for each lead time, including past observed discharges as input variables, and (3) option 2, including the LSTM-predicted discharges as input variables. In the absence of historical archives of weather forecasts (rainfall, temperatures, etc.), the different forecasting approaches will be tested in two configurations: (1) weather forecasts assumed to be perfect (using observed meteorological variables over the forecast horizon in place of predicted variables or ensembles) and (2) use of ensembles reflecting climatological variability over the forecast horizons for meteorological variables ensembles made up of time series randomly selected from the past. The forecast horizons considered range from 1 to 10 days, and the results are analyzed in light of the time of concentration of the watersheds.

 

References

1. Kratzert F, Klotz D, Brenner C, Schulz K, Herrnegger M. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol Earth Syst Sci. 2018;22(11):6005-6022. doi:10.5194/hess-22-6005-2018

2. Bourgin F, Ramos MH, Thirel G, Andréassian V. Investigating the interactions between data assimilation and post-processing in hydrological ensemble forecasting. J Hydrol (Amst). 2014;519:2775-2784. doi:10.1016/j.jhydrol.2014.07.054

3. Boucher M ‐A., Quilty J, Adamowski J. Data Assimilation for Streamflow Forecasting Using Extreme Learning Machines and Multilayer Perceptrons. Water Resour Res. 2020;56(6). doi:10.1029/2019WR026226

4. Piazzi G, Thirel G, Perrin C, Delaigue O. Sequential Data Assimilation for Streamflow Forecasting: Assessing the Sensitivity to Uncertainties and Updated Variables of a Conceptual Hydrological Model at Basin Scale. Water Resour Res. 2021;57(4). doi:10.1029/2020WR028390

5. Newman AJ, Clark MP, Sampson K, et al. Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance. Hydrol Earth Syst Sci. 2015;19(1):209-223. doi:10.5194/hess-19-209-2015

6. Kratzert, F. (2019). Pretrained models + simulations for our HESSD submission "Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets", HydroShare, https://doi.org/10.4211/hs.83ea5312635e44dc824eeb99eda12f06

7. Kratzert, F. (2019). CAMELS Extended Maurer Forcing Data, HydroShare, https://doi.org/10.4211/hs.17c896843cf940339c3c3496d0c1c077

How to cite: Saint Fleur, B. E., Gaume, E., Savary, M., Akil, N., and Theriez, D.: Short- and mid-term discharge forecasts combining machine learning and data assimilation for operational purpose, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16474, https://doi.org/10.5194/egusphere-egu24-16474, 2024.

EGU24-17502 | ECS | Posters on site | HS3.4

Towards improved Water Quality Modelling using Neural ODE models 

Marvin Höge, Florian Wenk, Andreas Scheidegger, Carlo Albert, and Andreas Frömelt

Neural Ordinary Differential Equations (ODEs) fuse neural networks with a mechanistic equation framework. This hybrid structure offers both traceability of model states and processes, as it is typical for physics-based models, and the ability of machine learning to encode new functional relations. Neural ODE models have demonstrated high potential in hydrologic predictions and scientific investigation of the related process in the hydrologic cycle, i.e. tasks of water quantity estimation (Höge et al., 2022).

This explicit representation of state variables is key to water quality modelling. There, we typically have several interrelated state variables like nitrate, nitrite, phosphorous, organic matter,…  Traditionally, these states are modelled based on mechanistic kinetic rate expressions that are often only rough approximations of the underlying dynamics. At the same time, this domain of water research suffers from data scarcity and therefore solely data-driven methods struggle to provide accurate predictions reliably. We show how to improve predictions of state dynamics and to foster knowledge gain about the processes in such interrelated systems with multiple states using Neural ODEs. 

Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., & Fenicia, F.: Improving hydrologic models for predictions and process understanding using Neural ODEs. Hydrol. Earth Syst. Sci., 26, 5085-5102, https://hess.copernicus.org/articles/26/5085/2022/

How to cite: Höge, M., Wenk, F., Scheidegger, A., Albert, C., and Frömelt, A.: Towards improved Water Quality Modelling using Neural ODE models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17502, https://doi.org/10.5194/egusphere-egu24-17502, 2024.

EGU24-17543 | Orals | HS3.4

Deep-learning-based prediction of damages related to surface water floods for impact-based warning 

Pascal Horton, Markus Mosimann, Severin Kaderli, Olivia Martius, Andreas Paul Zischg, and Daniel Steinfeld

Surface water floods are responsible for a substantial amount of damage to buildings, yet they have received less attention than fluvial floods. Nowadays, both research and insurance companies are increasingly focusing on these phenomena to enhance knowledge and prevention efforts. This study builds upon pluvial-related damage data provided by the Swiss Mobiliar Insurance Company and the Building Insurance of Canton Zurich (GVZ) with the goal of developing a data-driven model for predicting potential damages in future precipitation events.

This work is a continuation of a previous method applied to Swiss data, relying on thresholds based on the quantiles of precipitation intensity and event volume, which, however, resulted in an excessive number of false alarms. First, a logistic regression has been assessed using different characteristics of the precipitation event. Subsequently, a random forest was established, incorporating terrain attributes to better characterize local conditions. Finally, a deep learning model was developed to account for the spatio-temporal properties of the precipitation fields on a domain larger than the targeted 1 km cell. The deep learning model comprises a convolutional neural network (CNN) for 4D precipitation data and subsequent dense layers, incorporating static attributes. The model has been applied to predict the probability of damage occurrence, as well as the damage degree quantified by the number of claims relative to the number of insured buildings.

How to cite: Horton, P., Mosimann, M., Kaderli, S., Martius, O., Zischg, A. P., and Steinfeld, D.: Deep-learning-based prediction of damages related to surface water floods for impact-based warning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17543, https://doi.org/10.5194/egusphere-egu24-17543, 2024.

EGU24-18073 | ECS | Orals | HS3.4

Operational stream water temperature forecasting with a temporal fusion transformer model 

Ryan S. Padrón, Massimiliano Zappa, and Konrad Bogner

Stream water temperatures influence aquatic biodiversity, agriculture, tourism, electricity production, and water quality. Therefore, stakeholders would benefit from an operational forecasting service that would support timely action. Deep Learning methods are well-suited for this task as they can provide probabilistic forecasts at individual stations of a monitoring network. Here we train and evaluate several state-of-the-art models using 10 years of data from 55 stations across Switzerland. Static features (e.g. station coordinates, catchment mean elevation, area, and glacierized fraction), time indices, meteorological and/or hydrological observations from the past 64 days, and their ensemble forecasts for the following 32 days are included as predictors in the models to estimate daily maximum water temperature for the next 32 days. We find that the Temporal Fusion Transformer (TFT) model performs best for all lead times with a cumulative rank probability score (CRPS) of 0.73 ºC averaged over all stations, lead times and 90 forecasts distributed over 1 full year. The TFT is followed by the Recurrent Neural Network (CRPS = 0.77 ºC), Neural Hierarchical Interpolation for Time Series (CRPS = 0.80 ºC), and Multi-layer Perceptron (CRPS = 0.85 ºC). All models outperform the benchmark ARX model. When factoring out the uncertainty stemming from the meteorological ensemble forecasts by using observations instead, the TFT improves to a CRPS of 0.43 ºC, and it remains the best of all models. In addition, the TFT model identifies air temperature and time of the year as the most relevant predictors. Furthermore, its attention feature suggests a dominant response to more recent information in the summer, and to information from the previous month during spring and autumn. Currently, daily maximum water temperature probabilistic forecasts are produced twice per week and made available at https://drought.ch/de/allgemeine-lage/wassertemperatur/fliessgewaesser-1-1.html. 

How to cite: Padrón, R. S., Zappa, M., and Bogner, K.: Operational stream water temperature forecasting with a temporal fusion transformer model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18073, https://doi.org/10.5194/egusphere-egu24-18073, 2024.

EGU24-18154 | ECS | Orals | HS3.4

Can Blended Model Improve Streamflow Simulation In Diverse Catchments ? 

Daneti Arun Sourya and Maheswaran Rathinasamy

Streamflow simulation or rainfall-runoff modelling has been a topic of research for the past few decades which has resulted in a plethora of modelling approaches ranging from physics models to empirical or data driven approaches. There are many physics-based (PB) models available to estimate streamflow, but still there exists uncertainty in model outputs due to incomplete representations of physical processes. Further, with advancements in machine learning (ML) concepts, there have been several attempts but with no/little physical consistency. As a result, models based on ML algorithms may be unreliable if applied to provide future hydroclimate projections where climates and land use patterns are outside the range of training data. 

Here we test blended models built by combining PB model state variables (specifically soil moisture) with ML algorithms on their ability to simulate streamflow in 671 catchments representing diverse conditions across the conterminous United States.

For this purpose, we develop a suite of blended hydrological models by pairing different PB models (Catchment Wetness Index, Catchment Moisture Deficit, GR4J, Australian Water Balance, Single-bucket Soil Moisture Accounting, and Sacramento Soil Moisture Accounting models) with different ML methods such as Long Short Term Memory network (LSTM), eXtreme Gradient Boosting (XGB).

The results indicate that the blended models provide significant improvement in catchments where PB models are underperforming. Furthermore, the accuracy of streamflow estimation is improved in catchments where the ML models failed to estimate streamflow accurately.

How to cite: Sourya, D. A. and Rathinasamy, M.: Can Blended Model Improve Streamflow Simulation In Diverse Catchments ?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18154, https://doi.org/10.5194/egusphere-egu24-18154, 2024.

EGU24-18762 | Orals | HS3.4

Benchmarking hydrological models for national scale climate impact assessment 

Elizabeth Lewis, Ben Smith, Stephen Birkinshaw, Helen He, and David Pritchard

National scale hydrological models are required for many types of water sector applications, for example water resources planning. Existing UK national-scale model frameworks are based on conceptual numerical schemes, with an emerging trend towards incorporating deep learning models. Existing literature has shown that groundwater/surface water interactions are key for accurately representing future flows, and these processes are most accurately represented with physically-based hydrological models.

In response to this, our study undertakes a comparative analysis of three national model frameworks (Neural Hydrology, HBV, SHETRAN) to investigate the necessity for physically-based hydrological modelling. The models were run with the full ensemble of bias-corrected UKCP18 12km RCM data which enabled a direct comparison of future flow projections. We show that whilst many national frameworks perform well for the historical period, physically-based models can give substantially different projections of future flows, particularly low flows. Moreover, our study illustrates that the physically-based model exhibits a consistent trajectory in Budyko space between the baseline and future simulations, a characteristic not shared by conceptual and deep learning models. To provide context for these results, we incorporate insights from other national model frameworks, including the eFlag project.

How to cite: Lewis, E., Smith, B., Birkinshaw, S., He, H., and Pritchard, D.: Benchmarking hydrological models for national scale climate impact assessment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18762, https://doi.org/10.5194/egusphere-egu24-18762, 2024.

EGU24-20636 | ECS | Orals | HS3.4

Can Attention Models Surpass LSTM in Hydrology? 

Jiangtao Liu, Chaopeng Shen, and Tadd Bindas

Accurate modeling of various hydrological variables is important for water resource management, flood forecasting, and pest control. Deep learning models, especially Long Short-Term Memory (LSTM) models based on Recurrent Neural Network (RNN) structures, have shown significant success in simulating streamflow, soil moisture, and model parameter assessment. With the development of large language models (LLMs) based on attention mechanisms, such as ChatGPT and Bard, we have observed significant advancements in fields like natural language processing (NLP), computer vision (CV), and time series prediction. Despite achieving advancements across various domains, the application of attention-based models in hydrology remains relatively limited, with LSTM models maintaining a dominant position in this field. This study evaluates the performance of 18 state-of-the-art attention-based models and their variants in hydrology. We focus on their performance in streamflow, soil moisture, snowmelt, and dissolved oxygen (DO) datasets, comparing them to LSTM models in both long-term and short-term regression and forecasting. We also examine these models' performance in spatial cross-validation. Our findings indicate that while LSTM models maintain strong competitiveness in various hydrological datasets, Attention models offer potential advantages in specific metrics and time lengths, providing valuable insights into applying attention-based models in hydrology. Finally, we discuss the potential applications of foundation models and how these methods can contribute to the sustainable use of water resources and the challenges of climate change.

How to cite: Liu, J., Shen, C., and Bindas, T.: Can Attention Models Surpass LSTM in Hydrology?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20636, https://doi.org/10.5194/egusphere-egu24-20636, 2024.

EGU24-20907 | Orals | HS3.4

Revolutionizing Flood Forecasting with a Generalized Deep Learning Model 

Julian Hofmann and Adrian Holt

The domain of spatial flood prediction is dominated by hydrodynamic models, which, while robust and adaptable, are often constrained by computational requirements and slow processing times. To address these limitations, the integration of Deep Learning (DL) models has emerged as a promising solution, offering the potential for rapid prediction capabilities, while maintaining a high output quality. However, a critical challenge with DL models lies in their requirement for retraining for each new domain area, based on the outputs of hydrodynamic simulations generated for that specific region. This need for domain-specific retraining hampers the scalability and quick deployment of DL models in diverse settings. Our research focuses on bridging this gap by developing a fully generalized DL model for flood prediction.

FloodWaive's approach pivots on creating a DL model that can predict flood events rapidly and accurately across various regions without requiring retraining for each new domain area. The model is trained on a rich dataset derived from numerous hydrodynamic simulations, encompassing a wide spectrum of topographical conditions. This training is designed to enable the model to generalize its predictive capabilities across different domains and weather patterns, thus overcoming the traditional limitation of DL models in this field.

Initial findings from the development phase are promising, showcasing the model's capability to process complex data and provide quick, accurate flood predictions. The success of this fully generalized DL modeling approach could revolutionize applications of flood predictions such as flood forecasting and risk analysis. Regarding the later, real-time evaluation of flood protection measures could become a reality. This would empower urban planners, emergency response teams, and environmental agencies with the ability to make informed decisions quickly, potentially saving lives and reducing economic losses.

While this project is still in its developmental stages, the preliminary results point towards a significant leap in flood forecasting technology. The ultimate goal is to offer a universally deployable, real-time flood prediction tool, significantly enhancing our ability to mitigate the impact of floods worldwide.

  

How to cite: Hofmann, J. and Holt, A.: Revolutionizing Flood Forecasting with a Generalized Deep Learning Model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20907, https://doi.org/10.5194/egusphere-egu24-20907, 2024.

One of the latent difficulties in the fields of climatology, meteorology, and hydrology is the scarce rainfall information available due to the limited or nonexistent instrumentation of river basins, especially in developing countries where the establishment and maintenance of equipment entail high costs relative to the available budget. Hence, the importance of generating alternatives that seek to improve spatial precipitation estimation has been increasing, given the advances in the implementation of computational algorithms that involve Machine Learning techniques. In this study, a multitask convolutional neural network was implemented, composed of an encoder-decoder architecture (U-Net), which simultaneously estimates the probability of rain through a classification model and the precipitation rate through a regression model at a spatial resolution of 2 km2 and a temporal resolution of 10 minutes. The input modalities included data from rain gauge stations, weather radar, and satellite information (GOES 16). For model training,  validation, and testing, a dataset was consolidated with 3 months of information (February to April 2021) with a distribution of 70/15/15 percent, covering the effective coverage range of the Munchique weather radar located in the Andean region of Colombia. The obtained results show a Probability of Detection (POD) of 0.59 and a False Alarm Rate (FAR) of 0.39. Regarding precipitation rate estimation, it is assessed with a Root Mean  Square Error (RMSE) of 1.13 mm/10min. This research highlights the significant capability of deep learning algorithms in reconstructing and reproducing the spatial pattern of rainfall in tropical regions with limited instrumentation. However, there is a need to continue strengthening climatological monitoring networks to achieve significant spatial representativeness, thereby reducing potential biases in model estimations. 

How to cite: Barrios, M., Rubiano, H., and Guevara-Ochoa, C.: Implementation of deep learning algorithms in the sub-hourly rainfall fields estimation from remote sensors and rainfall gauge information in the tropical Andes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21431, https://doi.org/10.5194/egusphere-egu24-21431, 2024.

EGU24-539 | ECS | Orals | HS3.9

Revisiting the common approaches for hydrological model calibration with high-dimensional parameters and objectives  

Songjun Wu, Doerthe Tetzlaff, Keith Beven, and Chris Soulsby

Successful calibration of distributed hydrological models is often hindered by complex model structures, incommensurability between observed and modeled variables, and the complex nature of many hydrological processes. Many approaches have been proposed and compared for calibration, but the comparisons were generally based on parsimonious models with limited objectives. The conclusions could change when more parameters are to be calibrated with multiple objectives and increasing data availability. In this study four different approaches (random sampling, DREAM, NSGA-II, GLUE Limits of acceptability) were tested for a complex application - to calibrate 58 parameters of a hydrological model against 24 objectives (soil moisture and isotopes at 3 depths under vegetation covers). By comparing the simulation performance of parameter sets selected from different approaches, we concluded that random sampling is still usable in high-dimensional parameter space, providing comparable performance to other approaches despite of the poor parameter identifiability. DREAM provided better simulation performance and parameter convergence with informal likelihood functions; however, the difficulty in describing model residual distribution could possibly result in inappropriate formal likelihood functions and thus the poor simulations. Multi-criteria calibration, taking NSGA-II as an example, gave ideal model performance/parameter identifiability and explicitly unravelled the trade-offs between objectives after aggregating them (into 2 or 4); but calibrating against all 24 objectives was hindered by the “curse of dimensionality”, as the increasing dimension exponentially expanded the Pareto front and increased the difficulty to differentiate parameter sets. Finally, Limits of acceptability also provided comparable simulations; moreover, it can be regarded as a learning tool because detailed information about model failures is available for each objective at each timestep. However, the limitation is the insufficient exploration of high-dimensional parameter space due to the use of Latin-Hypercube sampling.

Overall, all approaches showed benefits and limitations, and a general approach to be easily used for such complex calibration cases without trial-and-error is still lacking. By comparing those common approaches, we realised the difficulty to define a proper objective function for many-objective optimisation, either for aggregated scalar function (due to the difficulty of assigning weights or assuming a form for the residual distribution) or the vector function (due to the expansion of the Pareto front). In this context, the Limits of Acceptability approach provided a more flexible way to define the “objective function” for each timestep, though it introduces extra demands in understanding data uncertainties and deciding on what should be considered acceptable. Moreover, in such many-objective optimisation, it is possible that not a single parameter set can capture all the objectives satisfactorily (not in 8 million run in this study).  The non-existence of any global optimal in the sample suggests that the concept of equifinality should be embraced in using an ensemble of comparable parameters to represent such complex systems.

How to cite: Wu, S., Tetzlaff, D., Beven, K., and Soulsby, C.: Revisiting the common approaches for hydrological model calibration with high-dimensional parameters and objectives , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-539, https://doi.org/10.5194/egusphere-egu24-539, 2024.

EGU24-1745 | Posters on site | HS3.9

Predictive uncertainty analysis using null-space Monte Carlo  

Husam Baalousha, Marwan Fahs, and Anis Younes

The inverse problem in hydrogeology poses a significant challenge for modelers due to its ill-posed nature and the non-uniqueness of solutions. This challenge is compounded by the substantial computational efforts required for calibrating highly parameterized aquifers, particularly those with significant heterogeneity, such as karst limestone aquifers. While stochastic methods like Monte Carlo simulations are commonly used to assess uncertainty, their extensive computational requirements often limit their practicality.

The Null Space Monte Carlo (NSMC) method provides a parameter-constrained approach to address these challenges in inverse problems, allowing for the quantification of uncertainty in calibrated parameters. This method was applied to the northern aquifer of Qatar, which is characterized by high heterogeneity. The calibration of the model utilized the pilot point approach, and the calibrated results were spatially interpolated across the aquifer area using kriging.

NSMC was then employed to generate 100 sets of parameter-constrained random variables representing hydraulic conductivities. The null space vectors of these random solutions were incorporated into the parameter space derived from the calibrated model. Statistical analysis of the resulting calibrated hydraulic conductivities revealed a wide range, varying from 0.1 to 350 m/d, illustrating the significant variability inherent in the karstic nature of the aquifer.

Areas with high hydraulic conductivity were identified in the middle and eastern parts of the aquifer. These regions of elevated hydraulic conductivity also exhibited high standard deviations, further emphasizing the heterogeneity and complex nature of the aquifer's hydraulic properties.

How to cite: Baalousha, H., Fahs, M., and Younes, A.: Predictive uncertainty analysis using null-space Monte Carlo , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1745, https://doi.org/10.5194/egusphere-egu24-1745, 2024.

Remote sensing observations hold useful prior information about the terrestrial water cycle. However, combining remote sensing products for each hydrological variable does not close the water balance due to the associated uncertainties. Therefore, there is a need to quantify bias and random errors in the data. This study presents an extended version of the data-driven probabilistic data fusion for closing the water balance at a basin scale. In this version, we implement a monthly 250-m grid-based Bayesian hierarchical model leveraging multiple open-source data of precipitation, evaporation, and storage in an ensemble approach that fully exploits and maximizes the prior information content of the data. The model relates each variable in the water balance to its “true” value using bias and random error parameters with physical nonnegativity constraints. The water balance variables and error parameters are treated as unknown random variables with specified prior distributions. Given an independent set of ground-truth data on water imports and river discharge along with all monthly gridded water balance data, the model is solved using a combination of Markov Chain Monte Carlo sampling and iterative smoothing to compute posterior distributions of all unknowns. The approach is applied to the Hindon Basin, a tributary of the Ganges River, that suffers from groundwater overexploitation and depends on surface water imports. Results provide spatially distributed (i) hydrologically consistent water balance estimates and (ii) statistically consistent error estimates of the water balance data. 

How to cite: Mourad, R., Schoups, G., and Bastiaanssen, W.: A grid-based data-driven ensemble probabilistic data fusion: a water balance closure approach applied to the irrigated Hindon River Basin, India , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2267, https://doi.org/10.5194/egusphere-egu24-2267, 2024.

EGU24-2300 | ECS | Posters on site | HS3.9

Representing systematic and random errors of eddy covariance measurements in suitable likelihood models for robust model selection  

Tobias Karl David Weber, Alexander Schade, Robert Rauch, Sebastian Gayler, Joachim Ingwersen, Wolfgang Nowak, Efstathios Diamantopoulos, and Thilo Streck

The importance of evapotranspiration (ET) fluxes for the terrestrial water cycle is demonstrated by an overwhelming body of literature. Unfortunately, errors in their measurement contribute significantly to (model) uncertainties in quantifying and understanding ecohydrological systems. Measurements of surface-atmosphere fluxes of water at the ecosystem scale, the eddy covariance method can be considered a powerful technique and considered an important tool to validate ET models. Spatially averaged fluxes of several hundred square meters may be obtained. While the eddy-covariance technique has become a routine method to estimate the turbulent energy fluxes at the soil-atmosphere boundary, it remains not error free. Some of the inherent errors are quantifiable and may be partitioned into systematic and stochastic errors. For model-data comparison, the nature of the measurement error needs to be known to derive knowledge about model adequacy. To this end, we compare several assumptions found in the literature to describe the statistical properties of the error with newly derived descriptions, in this study. We are able to show, how sensitive the assumptions about the error are on the model selection process. We demonstrate this by comparing daily agro-ecosystem ET fluxes simulated with the detailed agro-hydrological model Expert-N to data gathered using the eddy-covariance technique.

How to cite: Weber, T. K. D., Schade, A., Rauch, R., Gayler, S., Ingwersen, J., Nowak, W., Diamantopoulos, E., and Streck, T.: Representing systematic and random errors of eddy covariance measurements in suitable likelihood models for robust model selection , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2300, https://doi.org/10.5194/egusphere-egu24-2300, 2024.

EGU24-4140 | ECS | Orals | HS3.9

Integrating Deterministic and Probabilistic Approaches for Improved Hydrological Predictions: Insights from Multi-model Assessment in the Great Lakes Watersheds 

Jonathan Romero-Cuellar, Rezgar Arabzadeh, James Craig, Bryan Tolson, and Juliane Mai

The utilization of probabilistic streamflow predictions holds considerable value in the domains of predictive uncertainty estimation, hydrologic risk management, and decision support in water resources. Typically, the quantification of predictive uncertainty is formulated and evaluated using a solitary hydrological model, posing challenges in extrapolating findings to diverse model configurations. To address this limitation, this study examines variations in the performance ranking of various streamflow models through the application of a residual error model post-processing approach across multiple basins and models. The assessment encompasses 141 basins within the Great Lakes watershed, spanning the USA and Canada, and involves the evaluation of 13 diverse streamflow models using deterministic and probabilistic performance metrics. This investigation scrutinizes the interdependence between the quality of probabilistic streamflow estimation and the underlying model quality. The results underscore that the selection of a streamflow model significantly influences the robustness of probabilistic predictions. Notably, transitioning from deterministic to probabilistic predictions, facilitated by a post-processing approach, maintains the performance ranking consistency for the best and worst deterministic models. However, models of intermediate rank in deterministic evaluation exhibit inconsistent rankings when evaluated in probabilistic mode. Furthermore, the study reveals that post-processing residual errors of long short-term memory (LSTM) network models consistently outperform other models in both deterministic and probabilistic metrics. This research emphasizes the importance of integrating deterministic streamflow model predictions with residual error models to enhance the quality and utility of hydrological predictions. It elucidates the extent to which the efficacy of probabilistic predictions is contingent upon the sound performance of the underlying model and its potential to compensate for deficiencies in model performance. Ultimately, these findings underscore the significance of combining deterministic and probabilistic approaches for improving hydrological predictions, quantifying uncertainty, and supporting decision-making in operational water management.

How to cite: Romero-Cuellar, J., Arabzadeh, R., Craig, J., Tolson, B., and Mai, J.: Integrating Deterministic and Probabilistic Approaches for Improved Hydrological Predictions: Insights from Multi-model Assessment in the Great Lakes Watersheds, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4140, https://doi.org/10.5194/egusphere-egu24-4140, 2024.

EGU24-5219 | ECS | Posters on site | HS3.9

Quantifying Uncertainty in Surrogate-based Bayesian Inference 

Anneli Guthke, Philipp Reiser, and Paul-Christian Bürkner

Proper sensitivity and uncertainty analysis for complex Earth and environmental systems models may become computationally prohibitive. Surrogate models can be an alternative to enable such analyses: they are cheap-to-run statistical approximations to the simulation results of the original expensive model. Several approaches to surrogate modelling exist, all with their own challenges and uncertainties. It is crucial to correctly propagate the uncertainties related to surrogate modelling to predictions, inference and derived quantities in order to draw the right conclusions from using the surrogate model.

While the uncertainty in surrogate model parameters due to limited training data (expensive simulation runs) is often accounted for, what is typically ignored is the approximation error due to the surrogate’s structure (bias in reproducing the original model predictions). Reasons are that such a full uncertainty analysis is computationally costly even for surrogates (or limited to oversimplified analytic cases), and that a comprehensive framework for uncertainty propagation with surrogate models was missing.

With this contribution, we propose a fully Bayesian approach to surrogate modelling, uncertainty propagation, parameter inference, and uncertainty validation. We illustrate the utility of our approach with two synthetic case studies of parameter inference and validate our inferred posterior distributions by simulation-based calibration. For Bayesian inference, the correct propagation of surrogate uncertainty is especially relevant, because failing to account for it may lead to biased and/or overconfident parameter estimates and will spoil further interpretation in the physics’ context or application of the expensive simulation model.

Consistent and comprehensive uncertainty propagation in surrogate models enables more reliable approximation of expensive simulations and will therefore be useful in various fields of applications, such as surface or subsurface hydrology, fluid dynamics, or soil hydraulics.

How to cite: Guthke, A., Reiser, P., and Bürkner, P.-C.: Quantifying Uncertainty in Surrogate-based Bayesian Inference, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5219, https://doi.org/10.5194/egusphere-egu24-5219, 2024.

EGU24-6157 | ECS | Orals | HS3.9

Analyzing Groundwater Hazards with Sequential Monte Carlo  

Lea Friedli and Niklas Linde

Analyzing groundwater hazards frequently involves utilizing Bayesian inversions and estimating probabilities associated with rare events. A concrete example concerns the potential contamination of an aquifer, a process influenced by the unknown hydraulic properties of the subsurface. In this context, the emphasis shifts from the posterior distribution of model parameters to the distribution of a particular quantity of interest dependent on these parameters. To tackle the methodological hurdles at hand, we propose a Sequential Monte Carlo approach in two stages. The initial phase involves generating particles to approximate the posterior distribution, while the subsequent phase utilizes subset sampling techniques to evaluate the probability of the specific rare event of interest. Exploring a two-dimensional flow and transport example, we demonstrate the efficiency and accuracy of the developed PostRisk-SMC method in estimating rare event probabilities associated with groundwater hazards.

How to cite: Friedli, L. and Linde, N.: Analyzing Groundwater Hazards with Sequential Monte Carlo , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6157, https://doi.org/10.5194/egusphere-egu24-6157, 2024.

EGU24-7610 | Posters on site | HS3.9

Parameter estimation of heterogeneous field in basin scale based on signal analysis and river stage tomography 

Bo-Tsen Wang, Chia-Hao Chang, and Jui-Pin Tsai

Understanding the spatial distribution of the aquifer parameters is crucial to evaluating the groundwater resources on a basin scale. River stage tomography (RST) is one of the potential methods to estimate the aquifer parameter fields. Utilizing the head variations caused by the river stage to conduct RST is essential to delineate the regional aquifer's spatial features successfully. However, the two external stimuli of the aquifer system, rainfall and river stage, are usually highly correlated, resulting in mixed features in the head observations, which may cause unreasonable estimates of parameter fields. Thus, separating the head variations sourced from rainfall and river stage is essential to developing the reference heads for RST. To solve this issue, we propose a systematic approach to extracting and reconstructing the head variations of river features from the original head observations during the flood periods and conducting RST. We utilized a real case study to examine the developed method. This study used the groundwater level data, rainfall data, and river stage data in the Zhuoshui River alluvial fan in 2006. The hydraulic diffusivity (D) values of five observation wells were used as the reference for parameter estimation. The results show that the RMSE of the D value is 0.027 (m2/s). The other three observation wells were selected for validation purposes, and the derived RMSE is 0.85(m2/s). The low RMSE reveals that the estimated D field can capture the characteristics of the regional aquifer. The results also indicate that the estimated D values derived from the developed method are consistent with the sampled D values from the pumping tests in the calibration and validation processes in the real case study. The results demonstrate that the proposed method can successfully extract and reconstruct the head variations of river features from the original head observations and can delineate the features of the regional parameter field. The proposed method can benefit RST studies and provide an alternative mixed-feature signal decomposition and reconstruction method.

How to cite: Wang, B.-T., Chang, C.-H., and Tsai, J.-P.: Parameter estimation of heterogeneous field in basin scale based on signal analysis and river stage tomography, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7610, https://doi.org/10.5194/egusphere-egu24-7610, 2024.

EGU24-7820 | Orals | HS3.9

Data-driven surrogate-based Bayesian model calibration for predicting vadose zone temperatures in drinking water supply pipes 

Ilja Kröker, Elisabeth Nißler, Sergey Oladyshkin, Wolfgang Nowak, and Claus Haslauer

Soil temperature and soil moisture in the unsaturated zone depend on each other and are influenced by non-stationary hydro-meteorological forcing factors that are subject to climate change. 

The transport of both heat and moisture are crucial for predicting temperatures in the shallow subsurface and, as consequence, around and in drinking water supply pipes. Elevated temperatures in water supply pipes (even up to 25°C and above) pose a risk to human health due to increased likelihood of microbial contamination. 

To model variably saturated flow and heat transport, a partial differential equation (PDE)-based integrated hydrogeological model has been developed and implemented in the DuMuX simulation framework.  This model integrates the hydrometeorological forcing functions via a novel interface condition at the atmosphere-subsurface boundary. Relevant soil properties and their dependency on temperatures have been measured as time series at a pilot site at the University of Stuttgart in detail since 2020. 

Despite these efforts on measurements and model enhancement, some uncertainties remain. These include capillary-saturation relationships in materials where they are difficult to measure, especially in the gravel-type materials that are commonly used above drinking water pipes. 

To enhance our understanding of the underlying physical processes, we employ Bayesian inference, which is a well-established approach to estimate uncertain or unknown model parameters. Computationally cheap surrogate models allow to overcome the limitations of Bayesian methods for computationally intensive models, when such surrogate models are used in lieu of the physical (PDE)-based model. Here, we use the arbitrary polynomial chaos expansion equipped with Bayesian regularization (BaPC).  The BaPC allows to exploit latest (Bayesian) active learning strategies to reduce the number of model-runs that are necessary for constructing the surrogate model.  

In the present work, we demonstrate the calibration of a PDE-based integrated hydrogeological model using Bayesian inference on a BaPC-based surrogate.  The accuracy of the calibrated and predicted temperatures in the shallow subsurface is then assessed against real-world measurement data. 

How to cite: Kröker, I., Nißler, E., Oladyshkin, S., Nowak, W., and Haslauer, C.: Data-driven surrogate-based Bayesian model calibration for predicting vadose zone temperatures in drinking water supply pipes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7820, https://doi.org/10.5194/egusphere-egu24-7820, 2024.

EGU24-8007 | ECS | Orals | HS3.9

Investigating the divide and measure nonconformity  

Daniel Klotz, Martin Gauch, Frederik Kratzert, Grey Nearing, and Jakob Zscheischler

This contribution presents a diagnostic approach to investigate unexpected side effects that can occur during the evaluation of rainfall--runoff models.

The diagnostic technique that we use is based on the idea that one can use gradient descent to modify the runoff observations/simulations to obtain warranted observations/simulations. Specifically, we show how to use this concept to manipulate any hydrograph (e.g., a copy of the observations) so that it approximates specific NSE values for individual parts of the data. In short, we follow the following recipe to generate the synthetic simulations: (1) copy the observations, (2) add noise, (3) clip the modified discharge to zero, and (4) optimise the obtained simulation values by using gradient descent until a desired NSE value is reached.

To show how this diagnostic technique can be used we demonstrate a behaviour of Nash--Sutcliffe Efficiency (NSE) that appears when evaluating a model over subsets of the data: If models perform poorly for certain situations, this lack of performance is not necessarily reflected in the NSE (of the overall data). This behaviour follows from the definition of NSE and is therefore 100% explainable. However, from our experience it can be unexpected for many modellers. Our results also show that subdividing the data and evaluating over the resulting partitions yields different information regarding model deficiencies than an overall evaluation. We call this phenomenon the Divide And Measure Nonconformity or DAMN.



How to cite: Klotz, D., Gauch, M., Kratzert, F., Nearing, G., and Zscheischler, J.: Investigating the divide and measure nonconformity , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8007, https://doi.org/10.5194/egusphere-egu24-8007, 2024.

Groundwater heads are commonly used to monitor storage of aquifers and as decision variables for groundwater management. Alluvial gravel aquifers are often characterized by high transmissivities and a corresponding strong seasonal and inter-annual variability of storage. The sustainable management of such aquifers is challenging, particularly for already tightly allocated aquifers and in increasingly extreme and potentially drier climates, and might require the restriction of groundwater abstraction for periods of time. Stakeholders require lead-in time to prepare for potential restrictions of their consented takes.

Groundwater models have been used in the past to support groundwater decision making and to provide the corresponding predictions of groundwater levels for operational forecasting and management. In this study, we benchmark and compare different model classes to perform this task: (i) a spatially explicit 3D groundwater flow model (MODFLOW), (ii) a conceptual, bucket-type Eigenmodel, (iii) a transfer-function model (TFN), and (iv) three machine learning (ML) techniques, namely, Multi-Layer Perceptron models (MLP), Long Short-Term Memory models (LSTM), and Random Forrest (RF) models. The model classes differ widely in their complexity, input requirements, calibration effort, and run-times. The different model classes are tested on four groundwater head time series taken from the Wairau Aquifer in New Zealand (Wöhling et al., 2020). Posterior parameter ensembles of MODFLOW (Wöhling et al., 2018) and the EIGENMODEL (Wöhling & Burbery, 2020) were combined with TFN and ML variants with different input features to form a (prior) multi-model ensemble. Models classes are ranked with posterior model weights derived from Bayesian model selection (BMS) and averaging (BMA) techniques.

Our results demonstrate that no “model that fits all” exists in our model set. The more physics-based MODFLOW model is not necessarily providing the most accurate predictions, but can provide physical meaning and interpretation for the entire model region and outputs at locations where no data is available. ML techniques have generally much lower input requirements and short run-times. They show to be competitive candidates for groundwater head predictions where observations are available, even for system states that lie outside the calibration data range.

Because the performance of model types is site-specific, we advocate the use of multi-model ensemble forecasting wherever feasible. The benefit is illustrated by our case study, with BMA uncertainty bounds providing a better coverage of the data and the BMA mean performing well for all tested sites. Redundant ensemble members (with BMA weights of zero) are easily filtered out to obtain efficient ensembles for operational forecasting.

 

References

Wöhling T, Burbery L (2020). Eigenmodels to forecast groundwater levels in unconfined river-fed aquifers during flow recession. Science of the Total Environment, 747, 141220, doi: 10.1016/j.scitotenv.2020.141220.

Wöhling, T., Gosses, M., Wilson, S., Wadsworth, V., Davidson, P. (2018). Quantifying river-groundwater interactions of New Zealand's gravel-bed rivers: The Wairau Plain. Goundwater doi:10.1111/gwat.12625

Wöhling T, Wilson SR, Wadsworth V, Davidson P. (2020). Detecting the cause of change using uncertain data: Natural and anthropogenic factors contributing to declining groundwater levels and flows of the Wairau Plain Aquifer, New Zealand. Journal of Hydrology: Regional Studies, 31, 100715, doi: 10.1016/j.ejrh.2020.100715.

 

How to cite: Wöhling, T. and Crespo Delgadillo, O.: Predicting groundwater heads in alluvial aquifers: Benchmarking different model classes and machine-learning techniques with BMA/S, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8818, https://doi.org/10.5194/egusphere-egu24-8818, 2024.

EGU24-8872 | Orals | HS3.9

Characterization and modeling of large-scale aquifer systems under uncertainty: methodology and application to the Po River aquifer system 

Monica Riva, Andrea Manzoni, Rafael Leonardo Sandoval, Giovanni Michele Porta, and Alberto Guadagnini

Large-scale groundwater flow models are key to enhance our understanding of the potential impacts of climate and anthropogenic factors on water systems. Through these, we can identify significant patterns and processes that most affect water security. In this context, we have developed a comprehensive and robust theoretical framework and operational workflow that can effectively manage complex heterogeneous large-scale groundwater systems. We rely on machine learning techniques to map the spatial distribution of geomaterials within three-dimensional subsurface systems. The groundwater modeling approach encompasses (a) estimation of groundwater recharge and abstractions, as well as (b) appraisal of interactions among subsurface and surface water bodies. We ground our analysis on a unique dataset that encompasses lithostratigraphic data as well as piezometric and water extraction data across the largest aquifer system in Italy (the Po River basin). The quality of our results is assessed against pointwise information and hydrogeological cross-sections which are available within the reconstructed domain. These can be considered as soft information based on expert assessment. As uncertainty quantification is critical for subsurface characterization and assessment of future states of the groundwater system, the proposed methodology is designed to provide a quantitative evaluation of prediction uncertainty at any location of the reconstructed domain. Furthermore, we quantify the relative importance of uncertain model parameters on target model outputs through the implementation of a rigorous Global Sensitivity Analysis. By evaluating the spatial distribution of global sensitivity metrics associated with model parameters, we gain valuable insights into areas where the acquisition of future information could enhance the quality of groundwater flow model parameterization and improve hydraulic head estimates. The comprehensive dataset provided in this study, combined with the reconstruction of the subsurface system properties and piezometric head distribution and with the quantification of the associated uncertainty, can be readily employed in the context of groundwater availability and quality studies associated with the region of interest. The approach and operational workflow are flexible and readily transferable to assist identification of the main dynamics and patterns of large-scale aquifer systems of the kind here analyzed.

How to cite: Riva, M., Manzoni, A., Sandoval, R. L., Porta, G. M., and Guadagnini, A.: Characterization and modeling of large-scale aquifer systems under uncertainty: methodology and application to the Po River aquifer system, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8872, https://doi.org/10.5194/egusphere-egu24-8872, 2024.

EGU24-10517 | Orals | HS3.9

Lock-ins and path dependency in evaluation metrics used for hydrological models 

Lieke Melsen, Arnald Puy, and Andrea Saltelli

Science, being conducted by humans, is inherently a social activity. This is evident in the development and acceptance of scientific methods. Science is not only socially shaped, but also driven (and in turn influenced) by technological development: technology can open up new research avenues. At the same time, it has been shown that technology can cause lock-ins and path dependency. A scientific activity driven both by social behavior and technological development is modelling. As such, studying modelling as a socio-technical activity can provide insights both in enculturation processes and in lock-ins and path dependencies. Even more, enculturation can lead to lock-ins. We will demonstrate this for the Nash-Sutcliffe Efficiency (NSE), a popular evaluation metric in hydrological research. Through a bibliometric analysis we show that the NSE is part of hydrological research culture and does not appear in adjacent research fields. Through a historical analysis we demonstrate the path dependency that has developed with the popularity of the NSE. Finally, through exploring the faith of alternative measures, we show the lock-in effect of the use of the NSE. As such, we confirm that the evaluation of models needs to take into account cultural embeddedness. This is relevant because peers' acceptance is a powerful legitimization argument to trust the model and/or model results, including for policy relevant applications. Culturally determined bias needs to be assessed for its potential consequences in the discipline. 

How to cite: Melsen, L., Puy, A., and Saltelli, A.: Lock-ins and path dependency in evaluation metrics used for hydrological models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10517, https://doi.org/10.5194/egusphere-egu24-10517, 2024.

EGU24-10770 | Orals | HS3.9 | Highlight

Uncertainty and sensitivity analysis: new purposes, new users, new challenges 

Francesca Pianosi, Hannah Bloomfield, Gemma Coxon, Robert Reinecke, Saskia Salwey, Georgios Sarailidis, Thorsten Wagener, and Doris Wendt

Uncertainty and sensitivity analysis are becoming an integral part of mathematical modelling of earth and environmental systems. Uncertainty analysis aims at quantifying uncertainty in model outputs, which helps to avoid spurious precision and increase the trustworthiness of model-informed decisions. Sensitivity analysis aims at identifying the key sources of output uncertainty, which helps to set priorities for uncertainty reduction and model improvement.

In this presentation, we draw on a range of recent studies and projects to discuss the status of uncertainty and sensitivity analysis, focusing in particular on ‘global’ approaches, whereby uncertainties and sensitivities are quantified across the entire space of plausible variability of model inputs.

We highlight some of the challenges and untapped potential of these methodologies, including: (1) innovative ways to use global sensitivity analysis to test the ‘internal consistency’ of models and therefore support their diagnostic evaluation; (2) challenges and opportunities to promote the uptake of these methodologies to increasingly complex models, chains of models, and models used in industry; (3) the limits of uncertainty and sensitivity analysis when dealing with epistemic, poorly bounded or unquantifiable sources of uncertainties.

How to cite: Pianosi, F., Bloomfield, H., Coxon, G., Reinecke, R., Salwey, S., Sarailidis, G., Wagener, T., and Wendt, D.: Uncertainty and sensitivity analysis: new purposes, new users, new challenges, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10770, https://doi.org/10.5194/egusphere-egu24-10770, 2024.

EGU24-11414 | ECS | Posters on site | HS3.9

Single vs. multi-objective optimization approaches to calibrate an event-based conceptual hydrological model using model output uncertainty framework. 

Muhammad Nabeel Usman, Jorge Leandro, Karl Broich, and Markus Disse

Flash floods have become one of the major natural hazards in central Europe, and climate change projections indicate that the frequency and severity of flash floods will increase in many areas across the world and in central Europe. The complexity involved in the flash flood generation makes it difficult to calibrate a hydrological model for the prediction of such peak hydrological events. This study investigates the best approach to calibrate an event-based conceptual HBV model, comparing different trials of single-objective, single-event multi-objective (SEMO), and multi-event-multi-objective (MEMO) model calibrations. Initially, three trials of single-objective calibration are performed w.r.t. RMSE, NSE, and BIAS separately, then three different trials of multi-objective optimization, i.e., SEMO-3D (single-event three objectives), MEMO-3D (mean of three objectives from two events), and MEMO-6D (two events six objectives) are formulated. Model performance was validated for several peak events via 90 % (confidence interval) CI-based output uncertainty quantification. The uncertainties associated with the model predictions are estimated stochastically using the ‘relative errors (REs)’ between the simulated (Qsim) and measured (Qobs) discharges as a likelihood measure. Single-objective model calibration demonstrated that significant trade-offs exist between different objective functions, and no unique parameter set can optimize all objectives simultaneously. Compared to the solutions of single-objective calibration, all the multi-objective calibration formulations produced relatively accurate and robust results during both model calibration and validation phases. The uncertainty intervals associated with all the trials of single-objective calibration and the SEMO-3D calibration failed to capture observed peaks of the validation events. The uncertainty bands associated with the ensembles of Pareto solutions from the MEMO-3D and MEMO-6D (six-dimensional) calibrations displayed better performance in reproducing and capturing more significant peak validation events. However, to bracket peaks of large flash flood events within the prediction uncertainty intervals, the MEMO-6D optimization outperformed all the single-objective, SEMO-3D, and MEMO-3D multi-objective calibration methods. This study suggests that the MEMO_6D is the best approach for predicting large flood events with lower model output uncertainties when the calibration is performed with a better combination of peak events.

How to cite: Usman, M. N., Leandro, J., Broich, K., and Disse, M.: Single vs. multi-objective optimization approaches to calibrate an event-based conceptual hydrological model using model output uncertainty framework., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11414, https://doi.org/10.5194/egusphere-egu24-11414, 2024.

EGU24-12676 | ECS | Posters on site | HS3.9

Physics-Informed Ensemble Surrogate Modeling of Advective-Dispersive Transport Coupled with Film Intraparticle Pore Diffusion Model for Column Leaching Test 

Amirhossein Ershadi, Michael Finkel, Binlong Liu, Olaf Cirpka, and Peter Grathwohl

Column leaching tests are a common approach for evaluating the leaching behavior of contaminated soil and waste materials, which are often reused for various construction purposes. The observed breakthrough curves of the contaminants are affected by the intricate dynamics of solute transport, inter-phase mass transfer, and dispersion. Disentangling these interactions requires numerical models. However, inverse modeling and parameter sensitivity analysis are often time-consuming, especially when sorption/desorption kinetics are explicitly described by intra-particle diffusion, requiring the discretization along the column axis and inside the grains. To replace such computationally expensive models, we developed a machine-learning based surrogate model employing two disparate ensemble methods (stacking and weighted distance average) within the defined parameter range based on the German standard for column leaching tests. To optimize the surrogate model, adaptive sampling methods based on three distinct infill criteria are employed. These criteria include maximizing expected improvement, the Mahalanobis distance (exploitation), and maximizing standard deviation (exploration).
The stacking surrogate model makes use of extremely randomized trees and random forest as base- and meta-model. The model shows a very good performance in emulating the behavior of the original numerical model (Relative Root Mean Squared Error = 0.09). 
Our proposed surrogate model has been applied to estimate the complete posterior parameter distribution using Markov Chain Monte Carlo simulation. The impact of individual input parameters on the predictions generated by the surrogate model was analyzed using SHapley Additive exPlanations methods.

How to cite: Ershadi, A., Finkel, M., Liu, B., Cirpka, O., and Grathwohl, P.: Physics-Informed Ensemble Surrogate Modeling of Advective-Dispersive Transport Coupled with Film Intraparticle Pore Diffusion Model for Column Leaching Test, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12676, https://doi.org/10.5194/egusphere-egu24-12676, 2024.

EGU24-13393 | ECS | Posters on site | HS3.9

Datasets and tools for local and global meteorological ensemble estimation 

Guoqiang Tang, Andrew Wood, Andrew Newman, Martyn Clark, and Simon Papalexiou

Ensemble gridded meteorological datasets are critical for driving hydrology and land models, enabling uncertainty analysis, and supporting a variety of hydroclimate research and applications. The Gridded Meteorological Ensemble Tool (GMET) has been a significant contributor in this domain, offering an accessible platform for generating ensemble precipitation and temperature datasets. The GMET methodology has continually evolved since its initial development in 2006, primarily in the form of a FORTRAN code base, and has since been utilized to generate historical and real-time ensemble meteorological (model forcing) datasets in the U.S. and part of Canada. A recent adaptation of GMET was used to produce multi-decadal forcing datasets for North America and the globe (EMDNA and EM-Earth, respectively). Those datasets have been used to support diverse hydrometeorological applications such as streamflow forecasting and hydroclimate studies across various scales. GMET has now evolved into a Python package called the Geospatial Probabilistic Estimation Package (GPEP), which offers methodological and technical enhancements relative to GMET. These include greater variable selection flexibility, intrinsic parallelization, and especially a broader suite of estimation methods, including the use of techniques from the scikit-learn machine learning library. GPEP enables a wider variety of strategies for local and global estimation of geophysical variables beyond traditional hydrological forcings.  This presentation summarizes GPEP and introduces major open-access ensemble datasets that have been generated with GMET and GPEP, including a new effort to create high-resolution (2 km) surface meteorological analyses for the US. These resources are useful in advancing hydrometeorological uncertainty analysis and geospatial estimation.

How to cite: Tang, G., Wood, A., Newman, A., Clark, M., and Papalexiou, S.: Datasets and tools for local and global meteorological ensemble estimation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13393, https://doi.org/10.5194/egusphere-egu24-13393, 2024.

We consider the optimal inference of spatially heterogeneous hydraulic conductivity and head fields based on three kinds of point measurements that may be available at monitoring wells: of head, permeability, and groundwater speed. We have developed a general, zonation-free technique for Monte Carlo (MC) study of field recovery problems, based on Karhunen-Loève (K-L) expansions of the unknown fields, whose coefficients are recovered by an analytical adjoint-state technique. This allows unbiased sampling from the space of all possible fields with a given correlation structure and efficient, automated gradient-descent calibration. The K-L basis functions have a straightforward notion of period, revealing the relationship between feature scale and reconstruction fidelity, and they have an a priori known spectrum, allowing for a non-subjective regularization term to be defined. We have performed automated MC calibration on over 1100 conductivity-head field pairs, employing a variety of point measurement geometries and quantified the mean-squared field reconstruction accuracy, both globally and as a function of feature scale.

We present heuristics for feature scale identification, examine global reconstruction error, and explore the value added by both groundwater speed measurements and by two different types of regularization. We show that significant feature identification becomes possible as feature scale exceeds four times measurement spacing and identification reliability subsequently improves in a power law fashion with increasing feature scale.

How to cite: Hansen, S. K., O'Malley, D., and Hambleton, J.: Feature scale and identifiability: quantifying the information that point hydraulic measurements provide about heterogeneous head and conductivity fields, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14219, https://doi.org/10.5194/egusphere-egu24-14219, 2024.

EGU24-14805 | Orals | HS3.9

Sensitivity analysis of input variables of a SWAT hydrological model using the machine learning technique of random forest 

Ali Abousaeidi, Seyed Mohammad Mahdi Moezzi, Farkhondeh Khorashadi Zadeh, Seyed Razi Sheikholeslami, Albert Nkwasa, and Ann van Griensven

Sensitivity analysis of complex models, with a large number of input variables and parameters, is time-consuming and inefficient, using traditional approaches. Considering the capability of computing importance indices, the machine learning technique of the Random Forest (RF) is introduced as an alternative to conventional methods of sensitivity analysis. One of the advantages of using the RF model is the reduction of computational costs for sensitivity analysis.

The objective of this research is to analyze the importance of the input variables of a semi-distributed and physically-based hydrological model, namely SWAT (soil and water assessment tool) using the RF model. To this end, an RF-based model is first trained using SWAT input variables (such as, precipitation and temperature) and SWAT output variables (like streamflow and sediment load). Then, using the importance index of the RF model, the ranking of input variables, in terms of their impact on the accuracy of the model results, is determined. Additionally, the results of the sensitivity analysis are examined graphically. To validate the ranking results of the RF-based approach, the parameter ranking results of the Sobol G function, using the RF-based approach and the sensitivity analysis method of Sobol’ are compared. The ranking of the model input variables plays a significant role in the development of models and prioritizing efforts to reduce model errors.

Key words: Sensitivity analysis, model input variables, Machine learning technique, Random forest, SWAT model.

How to cite: Abousaeidi, A., Moezzi, S. M. M., Khorashadi Zadeh, F., Sheikholeslami, S. R., Nkwasa, A., and van Griensven, A.: Sensitivity analysis of input variables of a SWAT hydrological model using the machine learning technique of random forest, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14805, https://doi.org/10.5194/egusphere-egu24-14805, 2024.

EGU24-16086 | ECS | Posters on site | HS3.9

Disentangling the role of different sources of uncertainty and model structural error on predictions of water and carbon fluxes with CLM5 for European observation sites 

Fernand Baguket Eloundou, Lukas Strebel, Bibi S. Naz, Christian Poppe Terán, Harry Vereecken, and Harrie-Jan Hendricks Franssen

The Community Land Model version 5 (CLM5) integrates processes encompassing the water, energy, carbon, and nitrogen cycles, and ecosystem dynamics, including managed ecosystems like agriculture. Nevertheless, the intricacy of CLM5 introduces predictive uncertainties attributed to factors such as input data, process parameterizations, and parameter values. This study conducts a comparative analysis between CLM5 ensemble simulations and eddy covariance and in-situ measurements, focusing on the effects of uncertain model parameters and atmospheric forcings on the water, carbon, and energy cycles.
Ensemble simulations for 14 European experimental sites were performed with the CLM5-BGC model, integrating the biogeochemistry component. In four perturbation experiments, we explore uncertainties arising from atmospheric forcing data, soil parameters, vegetation parameters, and the combined effects of these factors. The contribution of different uncertainty sources to total simulation uncertainty was analyzed by comparing the 99% confidence
intervals from ensemble simulations with measured terrestrial states and fluxes, using a three-way analysis of variance.
The study identifies that soil parameters primarily influence the uncertainty in estimating surface soil moisture, while uncertain vegetation parameters control the uncertainty in estimating evapotranspiration and carbon fluxes. A combination of uncertainty in atmospheric forcings and vegetation parameters mostly explains the uncertainty in sensible heat flux estimation. On average, the 99% confidence intervals envelope >40% of the observed fluxes, but this varies greatly between sites, exceeding 95% in some cases. For some sites, we could identify model structural errors related to model spin-up assumptions or erroneous plant phenology. The study guides identifying factors causing underestimation or overestimation in the variability of fluxes, such as crop parameterization or spin-up, and potential structural errors in point-scale simulations in CLM5.

How to cite: Eloundou, F. B., Strebel, L., Naz, B. S., Terán, C. P., Vereecken, H., and Hendricks Franssen, H.-J.: Disentangling the role of different sources of uncertainty and model structural error on predictions of water and carbon fluxes with CLM5 for European observation sites, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16086, https://doi.org/10.5194/egusphere-egu24-16086, 2024.

EGU24-16361 | ECS | Orals | HS3.9

Estimating prior distributions of TCE transformation rate constants from literature data 

Anna Störiko, Albert J. Valocchi, Charles Werth, and Charles E. Schaefer

Stochastic modeling of contaminant reactions requires the definition of prior distributions for the respective rate constants. We use data from several experiments reported in the literature to better understand the distribution of pseudo-first-order rate constants of abiotic TCE reduction in different sediments. These distributions can be used to choose informed priors for these parameters in reactive-transport models.

Groundwater contamination with trichloroethylene (TCE) persists at many hazardous waste sites due to back diffusion from low-permeability zones such as clay lenses. In recent years, the abiotic reduction of TCE by reduced iron minerals has gained attention as a natural attenuation process, but there is uncertainty as to whether the process is fast enough to be effective. Pseudo-first-order rate constants have been determined in laboratory experiments and are reported in the literature for various sediments and rocks, as well as for individual reactive minerals. However, rate constants can vary between sites and aquifer materials. Reported values range over several orders of magnitude.

To assess the uncertainty and variability of pseudo-first-order rate constants, we compiled data reported in several studies. We built a statistical model based on a hierarchical Bayesian approach to predict probability distributions of rate constants at new sites based on this data set. We then investigated whether additional information about the sediment composition at a site could reduce the uncertainty. We tested two sets of predictors: reactive mineral content or the extractable Fe(II) content. Knowing the reactive mineral content reduced the uncertainty only slightly. In contrast, knowing the Fe(II) content greatly reduced the uncertainty because the relationship between Fe(II) content and rate constants is approximately log-log-linear. Using a simple example of diffusion-controlled transport in a contaminated aquitard, we show how the uncertainty in the predicted rate constants affects the predicted remediation times.

How to cite: Störiko, A., Valocchi, A. J., Werth, C., and Schaefer, C. E.: Estimating prior distributions of TCE transformation rate constants from literature data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16361, https://doi.org/10.5194/egusphere-egu24-16361, 2024.

Deeper insights on internal model behaviors are essential as hydrological models are becoming more and more complex. Our study provides a framework which combines the time-varying global sensitivity analyses with data mining techniques to unravel the process-level behavior of high-complexity models and tease out the main information. The extracted information is further used to assist parameter identification. The physically-based Distributed Hydrology-Soil-Vegetation Model (DHSVM) set up in a mountainous watershed is used as a case study. Specifically, a two-step GSA including time-aggregated and time-variant approaches are conducted to address the problem of high parameter dimensionality and characterize the time-varying parameter importance. As we found difficulties in interpreting the long-term complicated dynamics, a clustering operation is performed to partition the entire period into several clusters and extract the corresponding temporal parameter importance patterns. Finally, the clustered time clusters are utilized in parameterization, where each parameter is identified in their dominant times. Results are summarized as follows: (1) importance of selected soil and vegetation parameters varies greatly throughout the period; (2) typical patterns of parameter importance corresponding to flood, very short dry-to-wet, fast recession and continuous dry periods are successfully distinguished. We argue that somewhere between “total period” and “continuous discrete time” can be more useful for understanding and interpretation; (3) parameters dominant for short times are much more identifiable when they are identified in dominance time cluster(s); (4) the enhanced parameter identifiability overall improves the model performance according to the metrics of NSE, LNSE, and RMSE, suggesting that the use of GSA information has the potential to provide a better search for optimal parameter sets.

How to cite: Wang, L., Xu, Y., Gu, H., and Liang, X.: Investigating dynamic parameter importance of a high-complexity hydrological model and implications for parameterization, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18569, https://doi.org/10.5194/egusphere-egu24-18569, 2024.

EGU24-18804 | ECS | Orals | HS3.9

Accelerating Hydrological Model Inversion: A Multilevel Approach to GLUE 

Max Rudolph, Thomas Wöhling, Thorsten Wagener, and Andreas Hartmann

Inverse problems play a pivotal role in hydrological modelling, particularly for parameter estimation and system understanding, which are essential for managing water resources. The application of statistical inversion methodologies such as Generalized Likelihood Uncertainty Estimation (GLUE) is often obstructed, however, by high model computational cost given that Monte Carlo sampling strategies often return a very small fraction of behavioural model runs. There is a need, however, to balance this aspect with the demand for broadly sampling the parameter space. Especially relevant for spatially distributed or (partial) differential equation based models, this aspect calls for computationally efficient methods of statistical inference that approximate the “true” posterior parameter distribution well. Our study introduces multilevel GLUE (MLGLUE), which effectively mitigates these computational challenges by exploiting a hierarchy of models with different computational grid resolutions (i.e., spatial or temporal discretisation), inspired by multilevel Monte Carlo strategies. Starting with low-resolution models, MLGLUE only passes parameter samples to higher-resolution models for evaluation if associated with a high likelihood, which poses a large potential for substantial computational savings. We demonstrate the applicability of the approach using a groundwater flow model with a hierarchy of different spatial resolutions. With MLGLUE, the computation time of parameter inference could be reduced by more than 60% compared to GLUE, while the resulting posterior distributions are virtually identical. Correspondingly, the uncertainty estimates of MLGLUE and GLUE are also very similar. Considering the simplicity of the implementation as well as its efficiency, MLGLUE promises to be an alternative for statistical inversion of computationally costly hydrological models.

How to cite: Rudolph, M., Wöhling, T., Wagener, T., and Hartmann, A.: Accelerating Hydrological Model Inversion: A Multilevel Approach to GLUE, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18804, https://doi.org/10.5194/egusphere-egu24-18804, 2024.

EGU24-19966 | Orals | HS3.9

Operational Sensitivity Analysis for Flooding in Urban Systems under Uncertainty 

Aronne Dell'Oca, Monica Riva, Alberto Guadagnini, and Leonardo Sandoval

The runoff process in environmental systems is influenced by various variables that are typically are affected by uncertainty. These include, for example, climate and hydrogeological quantities (hereafter denoted as environmental variables). Additionally, the runoff process is influenced by quantities that are amenable to intervention/design (hereafter denoted as operational variables) and can therefore be set to desired values on the basis of specific management choices. A key question in this context is: How do we discriminate the impact of operational variables, whose values can be decided in the system design or management phase, on system outputs considering also the action of uncertainty associated with environmental variables? We tackle this issue upon introducing a novel approach which we term Operational Sensitivity Analysis (OSA) and set within a Global Sensitivity Analysis (GSA) framework. OSA enables us to assess the sensitivity of a given model output specifically to operational factors, while recognizing uncertainty in the environmental variables. This approach is developed as a complement to a traditional GSA, which does not differentiate at the methodological level the nature of the type of variability associated with operational or environmental variables.

We showcase our OSA approach through an exemplary scenario associated with a urban catchment where flooding results from sewer system failure. In this context, we distinguish between operational variables, such as sewer system pipe properties and urban area infiltration capacity, and environmental variables such as, urban catchment drainage properties and rain event characteristics. Our results suggest that the diameter of a set of pipes in the sewer network is the most influential operational variable. As such, it provides a rigorous basis upon which one could plan appropriate actions to effectively manage the system response.

How to cite: Dell'Oca, A., Riva, M., Guadagnini, A., and Sandoval, L.: Operational Sensitivity Analysis for Flooding in Urban Systems under Uncertainty, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19966, https://doi.org/10.5194/egusphere-egu24-19966, 2024.

EGU24-20013 | ECS | Orals | HS3.9

Field-scale soil moisture predictions using in situ sensor measurements in an inverse modelling framework: SWIM² 

Marit Hendrickx, Jan Diels, Jan Vanderborght, and Pieter Janssens

With the rise of affordable, autonomous sensors and IoT (Internet-of-Things) technology, it is possible to monitor soil moisture in a field online and in real time. This offers opportunities for real-time model calibration for irrigation scheduling. A framework is presented where realtime sensor data are coupled with a soil water balance model to predict soil moisture content and irrigation requirements at field scale. SWIM², Sensor Wielded Inverse Modelling of a Soil Water Irrigation Model, is a framework based on the DREAM inverse modelling approach to estimate 12 model parameters (soil and crop growth parameters) and their uncertainty distribution. These parameter distributions result in soil moisture predictions with a prediction uncertainty estimate, which enables a farmer to anticipate droughts and estimate irrigation requirements.

The SWIM² framework was validated based on three growing seasons (2021-2023) in about 30 fields of vegetable growers in Flanders. Kullback–Leibler divergence (KLD) was used as a metric to quantify information gain of the model parameters starting from non-informative priors. Performance was validated in two steps, i.e. the calibration period and prediction period, which is in correspondence with the real-world implementation of the framework. The RMSE, correlation (R, NSE) and Kling-Gupta efficiency (KGE) of soil moisture were analyzed in function of time, i.e. the amount of available sensor data for calibration.

Soil moisture can be predicted accurately after 10 to 20 days of sensor data is available for calibration. The RMSE during the calibration period is generally around 0.02 m³/m³, while the RMSE during the prediction period decreases from 0.04 to 0.02 m³/m³ when more calibration data is available. Information gain (KLD) of some parameters (e.g. field capacity and curve number) largely depends on the presence of dynamic events (e.g. precipitation events) during the calibration period. After 40 days of sensor data, the KGE and Pearson correlation of the calibration period become stable with median values of 0.8 and 0.9, respectively. For the validation period, the KGE and Pearson correlation are increasing in time, with median values from 0.3 to 0.7 (KGE) and from 0.7 to 0.95 (R). These good results show that, with this framework, we can simulate and predict soil moisture accurately. These predictions can in turn be used to estimate irrigation requirements.

Precipitation radar data was primarily considered as an input without uncertainty. As an extension, precipitation forcing error can be treated in DREAM by applying rainfall multipliers as additional parameters that are estimated in the inverse modelling framework. The multiplicative error of the radar data was quantified by comparison of radar data to rain gauge measurements. The prior uncertainty of the logarithmic multipliers was described by a Laplace distribution and was implemented in DREAM. The extended framework with rainfall multipliers shows better convergence and acceptance rate compared to the main framework. The calibration period shows better performance with higher correlations and lower RMSE values, but a decrease in performance was found for the validation period. These results suggest that the implementation of rainfall multipliers leads to overfitting, resulting in lower predictive power.

How to cite: Hendrickx, M., Diels, J., Vanderborght, J., and Janssens, P.: Field-scale soil moisture predictions using in situ sensor measurements in an inverse modelling framework: SWIM², EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20013, https://doi.org/10.5194/egusphere-egu24-20013, 2024.

In recent years, Machine Learning (ML) models have led to a substantial improvement in hydrological predictions. It appears these models can distill information from catchment properties that is relevant for the relationship between meteorological drivers and streamflow, which has so far eluded hydrologists.
In the first part of this talk, I shall demonstrate some of our attempts towards understanding these improvements. Utilising Autoencoders and intrinsic dimension estimators, we have shown that the wealth of available catchment properties can effectively be summarised into merely three features, insofar as they are relevant for streamflow prediction. Hybrid models, which combine the flexibility of ML models with mechanistic mass-balance models, are equally adept at predicting as pure ML models but come with only a few interpretable interior states. Combining these findings will, hopefully, bring us closer to understanding what these ML models seem to have 'grasped'.
In the second part of the talk, I will address the issue of uncertainty quantification. I contend that error modelling should not be attempted on the residuals. Rather, we should model the errors where they originate, i.e., on the inputs, model states, and/or parameters. Such stochastic models are more adept at expressing the intricate distributions exhibited by real data. However, they come at the cost of a very large number of unobserved latent variables and thus pose a high-dimensional inference problem. This is particularly pertinent when our models include ML components. Fortunately, advances in inference algorithms and parallel computing infrastructure continue to extend the limits on the number of variables that can be inferred within a reasonable timeframe. I will present a straightforward example of a stochastic hydrological model with input uncertainty, where Hamiltonian Monte Carlo enables a comprehensive Bayesian inference of model parameters and the actual rain time-series simultaneously.

How to cite: Albert, C.: Advances and prospects in hydrological (error) modelling, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20170, https://doi.org/10.5194/egusphere-egu24-20170, 2024.

EGU24-262 | Orals | HS3.5

Differentiable modeling for global water resources under global change 

Chaopeng Shen, Yalan Song, Farshid Rahmani, Tadd Bindas, Doaa Aboelyazeed, Kamlesh Sawadekar, Martyn Clark, and Wouter Knoben

Process-based modeling offers interpretability and physical consistency in many domains of geosciences but struggles to leverage large datasets efficiently. Machine-learning methods, especially deep networks, have strong predictive skills yet are unable to answer specific scientific questions. A recently proposed genre of physics-informed machine learning, called “differentiable” modeling (DM, https://t.co/qyuAzYPA6Y), trains neural networks (NNs) with process-based equations (priors) together in one stage (so-called “end-to-end”) to benefit from the best of both NNs and process-based paradigms. The NNs do not need target variables for training but can be indirectly supervised by observations matching the outputs of the combined model, and differentiability critically supports learning from big data. We propose that differentiable models are especially suitable as global hydrologic models because they can harvest information from big earth observations to produce state-of-the-art predictions (https://mhpi.github.io/benchmarks/), enable physical interpretation naturally, extrapolate well (due to physical constraints) in space and time, enforce known physical laws and sensitivities, and leverage progress in modern AI computing architecture and infrastructure. Differentiable models can also synergize with existing global hydrologic models (GHMs) and learn from the lessons of the community. Differentiable GHMs to answer pressing societal questions on water resources availability, climate change impact assessment, water management, and disaster risk mitigation, among others. We demonstrate the power of differentiable modeling using computational examples in rainfall-runoff modeling, river routing, forcing fusion, as well applications in water-related domains such as ecosystem modeling and water quality modeling. We discuss how to address potential challenges such as implementing gradient tracking for implicit numerical schemes and addressing process tradeoffs. Furthermore, we show how differentiable modeling can enable us to ask fundamental questions in hydrologic sciences and get robust answers from big global data.

How to cite: Shen, C., Song, Y., Rahmani, F., Bindas, T., Aboelyazeed, D., Sawadekar, K., Clark, M., and Knoben, W.: Differentiable modeling for global water resources under global change, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-262, https://doi.org/10.5194/egusphere-egu24-262, 2024.

Streamflow can be affected by numerous factors, such as solar radiation, underlying surface conditions, and atmospheric circulation which results in nonlinearity, uncertainty, and randomness in streamflow time series. Diverse conventional and Deep Learning (DL) models have been applied to recognize the complex patterns and discover nonlinear relationships in the hydrological time series and incorporating multi-variables in deep learning can match or improve streamflow forecasts and hopes to improve extreme value predictions. Multivariate approaches surpass univariate ones by including additional time series as explanatory variables. Deep neural networks (DNNs) excel in multi-horizon time series forecasting, outperforming classical models. However, determining the relative contribution of each variable in streamflow remains challenging due to the black-box nature of DL models.

 

We propose utilizing the advanced Temporal Fusion Transformers (TFT) deep-learning technique to model streamflow values across various temporal scales, incorporating multiple variables. TFT's attention-based architecture enables high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. Additionally, the model identifies the significance of each input variable, recognizes persistent temporal patterns, and highlights extreme events. Despite its application in a few studies across different domains, the full potential of this model remains largely unexplored. The study focused on Sundargarh, an upper catchment of the Mahanadi basin in India, aiming to capture pristine flow conditions. QGIS was employed to delineate the catchment, and daily streamflow data from 1982 to 2020 were obtained from the Central Water Commission. Input variables included precipitation, potential evaporation, temperature, and soil water volume at different depths. Precipitation and temperature datasets were obtained from India Meteorological Department (IMD) datasets, while other variables were sourced from the ECMWF fifth-generation reanalysis (ERA-5). Hyperparameter tuning was conducted using the Optuna optimization framework, known for its efficiency and easy parallelization. The model trained using quantile loss function with different combinations of quantiles, demonstrated superior performance with upper quantiles. Evaluations using R2 and NSE indicated good performance in monthly streamflow predictions for testing sets, particularly in confidently predicting low and medium flows. While peak flows were well predicted at certain timesteps, there were instances of underperformance. Unlike other ML algorithms, TFT can learn seasonality and lag analysis patterns directly from raw training data, including the identification of crucial variables. The model underwent training for different time periods, checking for performance improvement with increased length of data. To gain a better understanding of how distinct sub-processes affect streamflow patterns at various time scales, the model was applied at pentad and daily scales. Evaluation at extreme values prompted an investigation into improving predictions through quantile loss function adjustments. Given the computational expense of daily streamflow forecasting using TFT with multiple variables, parallel computing is employed. Results demonstrated considerable accuracy, but validating TFT's interpretive abilities require testing alternative ML models.

 

How to cite: Mohan, M. and Kumar D, N.: Multivariate multi-horizon streamflow forecasting for extremes and their interpretation using an explainable deep learning architecture, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-451, https://doi.org/10.5194/egusphere-egu24-451, 2024.

EGU24-2211 | ECS | Posters on site | HS3.5

Staged Learning in Physics-Informed Neural Networks to Model Contaminant Transport under Parametric Uncertainty 

Milad Panahi, Giovanni Porta, Monica Riva, and Alberto Guadagnini

Addressing the complexities of groundwater modeling, especially under the veil of uncertain physical parameters and limited observational data, poses significant challenges. This study introduces an approach using Physics-Informed Neural Network (PINN) framework to unravel these uncertainties. Termed PINN under uncertainty, PINN-UU, adeptly integrates uncertain parameters within spatio-temporal domains, focusing on hydrological systems. This approach, exclusively built on underlying physical equations, leverages a staged training methodology, effectively navigating high-dimensional solution spaces. We demonstrate our approach through application of reactive transport modeling in porous media, a problem setting relevant to contaminant transport in soil and groundwater. PINN-UU shows promising capabilities in enhancing model reliability and efficiency, and in conducting sensitivity analysis. Our approach is designed to be accessible and engaging, offering insightful contributions to environmental engineering, and hydrological modeling. It represents a step toward deciphering complex geohydrological systems, with broad implications for resource management and environmental science.

How to cite: Panahi, M., Porta, G., Riva, M., and Guadagnini, A.: Staged Learning in Physics-Informed Neural Networks to Model Contaminant Transport under Parametric Uncertainty, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2211, https://doi.org/10.5194/egusphere-egu24-2211, 2024.

EGU24-2850 | ECS | Orals | HS3.5

Development of a Distributed Physics-informed Deep Learning Hydrological Model for Data-scarce Regions 

Liangjin Zhong, Huimin Lei, and JIngjing Yang

Climate change has exacerbated water stress and water-related disasters, necessitating more precise runoff simulations. However, in the majority of global regions, a deficiency of runoff data constitutes a significant constraint on modeling endeavors. Traditional distributed hydrological models and regionalization approaches have shown suboptimal performance. While current data-driven models trained on large datasets excel in spatial extrapolation, the direct applicability of these models in certain regions with unique hydrological processes may be challenging due to the limited representativeness within the training dataset. Furthermore, transfer learning deep learning models pre-trained on large datasets still necessitate local data for retraining, thereby constraining their applicability. To address these challenges, we present a physics-informed deep learning model based on a distributed framework. It involves spatial discretization and the establishment of differentiable hydrological models for discrete sub-basins, coupled with a differentiable Muskingum method for channel routing. By introducing upstream-downstream relationships, model errors in sub-basins propagate through the river network to the watershed outlet, enabling the optimization using limited downstream runoff data, thereby achieving spatial simulation of ungauged internal sub-basins. The model, when trained solely on the downstream-most station, outperforms the distributed hydrological model in runoff simulation at both the training station and upstream stations, as well as evapotranspiration spatial patterns. Compared to transfer learning, our model requires less training data, yet achieves higher precision in simulating runoff on spatially hold-out stations and provides more accurate estimates of spatial evapotranspiration. Consequently, this model offers a novel approach to hydrological simulation in data-scarce regions with unique processes.

How to cite: Zhong, L., Lei, H., and Yang, J.: Development of a Distributed Physics-informed Deep Learning Hydrological Model for Data-scarce Regions, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2850, https://doi.org/10.5194/egusphere-egu24-2850, 2024.

EGU24-3028 | Orals | HS3.5 | Highlight

Spatial sensitivity of river flooding to changes in climate and land cover through explainable AI 

Louise Slater, Gemma Coxon, Manuela Brunner, Hilary McMillan, Le Yu, Yanchen Zheng, Abdou Khouakhi, Simon Moulds, and Wouter Berghuijs

Explaining the spatially variable impacts of flood-generating mechanisms is a longstanding challenge in hydrology, with increasing and decreasing temporal flood trends often found in close regional proximity. Here, we develop a machine learning-informed approach to unravel the drivers of seasonal flood magnitude and explain the spatial variability of their effects in a temperate climate. We employ 11 observed meteorological and land cover time series variables alongside 8 static catchment attributes to model flood magnitude in 1268 catchments across Great Britain over four decades. We then perform a sensitivity analysis to understand how +10% precipitation, +1°C air temperature, or +10 percentage points of urbanisation or afforestation affect flood magnitude in catchments with varying characteristics. Our simulations show that increasing precipitation and urbanisation both tend to amplify flood magnitude significantly more in catchments with high baseflow contribution and low runoff ratio, which tend to have lower values of specific discharge on average. In contrast, rising air temperature (in the absence of changing precipitation) decreases flood magnitudes, with the largest effects in dry catchments with low baseflow index. Afforestation also tends to decrease floods more in catchments with low groundwater contribution, and in dry catchments in the summer. These reported associations are significant at p<0.001. Our approach may be used to further disentangle the joint effects of multiple flood drivers in individual catchments.

How to cite: Slater, L., Coxon, G., Brunner, M., McMillan, H., Yu, L., Zheng, Y., Khouakhi, A., Moulds, S., and Berghuijs, W.: Spatial sensitivity of river flooding to changes in climate and land cover through explainable AI, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3028, https://doi.org/10.5194/egusphere-egu24-3028, 2024.

EGU24-4105 | ECS | Orals | HS3.5

Global flood projection and socioeconomic implications under a physics-constrained deep learning framework 

Shengyu Kang, Jiabo Yin, Louise Slater, Pan Liu, and Dedi Liu

As the planet warms, the frequency and severity of weather-related hazards such as floods are intensifying, posing substantial threats to communities around the globe. Rising flood peaks and volumes can claim lives, damage infrastructure, and compromise access to essential services. However, the physical mechanisms behind global flood evolution are still uncertain, and their implications for socioeconomic systems remain unclear. In this study, we leverage a supervised machine learning technique to identify the dominant factors influencing daily streamflow. We then propose a physics-constrained cascade model chain which assimilates water and heat transport processes to project bivariate risk (i.e. flood peak and volume together), along with its socioeconomic consequences. To achieve this, we drive a hybrid deep learning-hydrological model with bias-corrected outputs from twenty global climate models (GCMs) under four shared socioeconomic pathways (SSPs). Our results project considerable increases in flood risk under the medium to high-end emission scenario (SSP3-7.0) over most catchments of the globe. The median future joint return period decreases from 50 years to around 27.6 years, with 186 trillion dollars and 4 billion people exposed. Downwelling shortwave radiation is identified as the dominant factor driving changes in daily streamflow, accelerating both terrestrial evapotranspiration and snowmelt. As future scenarios project enhanced radiation levels along with an increase in precipitation extremes, a heightened risk of widespread flooding is foreseen. This study aims to provide valuable insights for policymakers developing strategies to mitigate the risks associated with river flooding under climate change.

How to cite: Kang, S., Yin, J., Slater, L., Liu, P., and Liu, D.: Global flood projection and socioeconomic implications under a physics-constrained deep learning framework, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4105, https://doi.org/10.5194/egusphere-egu24-4105, 2024.

EGU24-4238 | ECS | Posters on site | HS3.5

Letting neural networks talk: exploring two probabilistic neural network models for input variable selection 

John Quilty and Mohammad Sina Jahangir

Input variable selection (IVS) is an integral part of building data-driven models for hydrological applications. Carefully chosen input variables enable data-driven models to discern relevant patterns and relationships within data, improving their predictive accuracy. Moreover, the optimal choice of input variables can enhance the computational efficiency of data-driven models, reduce overfitting, and contribute to a more interpretable and parsimonious model. Meanwhile, including irrelevant and/or redundant input variables can introduce noise to the model and hinder its generalization ability.

Three probabilistic IVS methods, namely Edgeworth approximation-based conditional mutual information (EA), double-layer extreme learning machine (DLELM), and gradient mapping (GM), were used for IVS and then coupled with a long short-term memory (LSTM)-based probabilistic deep learning model for daily streamflow prediction. While the EA method is an effective IVS method, DLELM and GM are examples of probabilistic neural network-based IVS methods that have not yet been explored for hydrological prediction. DLELM selects input variables through sparse Bayesian learning, pruning both input and output layer weights of a committee of neural networks. GM is based on saliency mapping, an explainable AI technique commonly used in computer vision that can be coupled with probabilistic neural networks. Both DLELM and GM involve randomization during parameter initialization and/or training thereby introducing stochasticity into the IVS procedure, which has been shown to improve the predictive performance of data-driven models.

The IVS methods were coupled with a LSTM-based probabilistic deep learning model and applied to a streamflow prediction case study using 420 basins spread across the continental United States. The dataset includes 37 candidate input variables derived from the daily-averaged ERA-5 reanalysis data.

Comparing the most frequently selected input variables by EA, DLELM, and GM across the 420 basins revealed that all three models select a similar number of input variables. For example, the top 15 input variables selected by all methods included nine variables that were similar.

The input variables selected by EA, DLELM, and GM were then used in the LSTM-based probabilistic deep learning models for streamflow prediction across the 420 basins. The probabilistic deep learning models were developed and optimized using the top 10 variables selected by each IVS method. The results were compared to a benchmark scenario that used all 37 ERA-5 variables in the prediction model. Overall, the findings show that the GM method results in higher prediction accuracy (Kling-Gupta efficiency; KGE) compared to the other two IVS methods. A median KGE of 0.63 was obtained for GM, whereas for the EA, DLELM, and all input variables’ scenario, KGE scores of 0.61, 0.60, and 0.62 were obtained, respectively.

DLELM and GM are two AI-based techniques that introduce elements of interpretability and stochasticity to the IVS process. The results of the current study are expected to contribute to the evolving landscape of data-driven hydrological modeling by introducing hitherto unexplored neural network-based IVS to pursue more parsimonious, efficient, and interpretable probabilistic deep learning models.

How to cite: Quilty, J. and Jahangir, M. S.: Letting neural networks talk: exploring two probabilistic neural network models for input variable selection, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4238, https://doi.org/10.5194/egusphere-egu24-4238, 2024.

EGU24-4325 | ECS | Posters on site | HS3.5

Towards learning human influences in a highly regulated basin using a hybrid DL-process based framework 

Liangkun Deng, Xiang Zhang, and Louise Slater

Hybrid models have shown impressive performance for streamflow simulation, offering better accuracy than process-based hydrological models (PBMs) and superior interpretability than deep learning models (DLMs). A recent paradigm for streamflow modeling, integrating DLMs and PBMs within a differentiable framework, presents considerable potential to match the performance of DLMs while simultaneously generating untrained variables that describe the entire water cycle. However, the potential of this framework has mostly been verified in small and unregulated headwater basins and has not been explored in large and highly regulated basins. Human activities, such as reservoir operations and water transfer projects, have greatly changed natural hydrological regimes. Given the limited access to operational water management records, PBMs generally fail to achieve satisfactory performance and DLMs are challenging to train directly. This study proposes a coupled hybrid framework to address these problems. This framework is based on a distributed PBM, the Xin'anjiang (XAJ) model, and adopts embedded deep learning neural networks to learn the physical parameters and replace the modules of the XAJ model reflecting human influences through a differentiable structure. Streamflow observations alone are used as training targets, eliminating the need for operational records to supervise the training process. The Hanjiang River basin (HRB), one of the largest subbasins of the Yangtze River basin, disturbed by large reservoirs and national water transfer projects, is selected to test the effectiveness of the framework. The results show that the hybrid framework can learn the best parameter sets of the XAJ model depicting natural and human influences to improve streamflow simulation. It performs better than a standalone XAJ model and achieves similar performance to a standalone LSTM model. This framework sheds new light on assimilating human influences to improve simulation performance in disturbed river basins with limited operational records.

How to cite: Deng, L., Zhang, X., and Slater, L.: Towards learning human influences in a highly regulated basin using a hybrid DL-process based framework, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4325, https://doi.org/10.5194/egusphere-egu24-4325, 2024.

EGU24-4768 | ECS | Orals | HS3.5

HydroPML: Towards Unified Scientific Paradigms for Machine Learning and Process-based Hydrology 

Qingsong Xu, Yilei Shi, Jonathan Bamber, Ye Tuo, Ralf Ludwig, and Xiao Xiang Zhu

Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing work predominantly concentrates on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We initiate a systematic exploration of hydrology in PaML, including rainfall-runoff and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for applications based on hydrological processes [1]. HydroPML presents a range of hydrology applications, including but not limited to rainfall-runoff-inundation modeling, real-time flood forecasting (FloodCast), rainfall-induced landslide forecasting (LandslideCast), and cutting-edge PaML methods, to enhance the explainability and causality of ML and lay the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.

[1] Xu, Qingsong, et al. "Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology." arXiv preprint arXiv:2310.05227 (2023).

How to cite: Xu, Q., Shi, Y., Bamber, J., Tuo, Y., Ludwig, R., and Zhu, X. X.: HydroPML: Towards Unified Scientific Paradigms for Machine Learning and Process-based Hydrology, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4768, https://doi.org/10.5194/egusphere-egu24-4768, 2024.

EGU24-6378 | ECS | Posters on site | HS3.5

Seasonal forecasts of hydrological droughts over the Alps: advancing hybrid modelling applications 

Iacopo F. Ferrario, Mariapina Castelli, Alasawedah M. Hussein, Usman M. Liaqat, Albrecht Weerts, and Alexander Jacob

The Alpine region is often called the Water Tower of Europe, alluding to its water richness and its function of supplying water through several important European rivers flowing well beyond its geographical boundaries. Climate change projections show that the region will likely experience rising temperatures and changes in precipitation type, frequency, and intensity, with consequences on the spatiotemporal pattern of water availability. Seasonal forecasts could supply timely information for planning water allocation a few months in advance, reducing potential conflicts under conditions of scarce water resources. The overall goal of this study is to improve the seasonal forecasts of hydrological droughts over the entire Alpine region at a spatial resolution (~1 km) that matches the information need by local water agencies, e.g., resolving headwaters and small valleys. In this study we present the progress on the following key objectives:

  • Improving the estimation of distributed model (Wflow_sbm) parameters by finding the optimal transfer function from geophysical attributes to model parameters and upscaling the information to model resolution.
  • Combining physical-hydrological knowledge with data-driven (ML/DL) techniques for improving accuracy and computational performance, without compromising on interpretation
  • Integrating EO-based hydrological fluxes, like streamflow, surface soil moisture, actual evapotranspiration, and snow waters equivalent, with the aim of regularizing the calibration/training, tackling the problem of model parameters equifinality.

Our work is part of the InterTwin project that aims at developing a multi-domain Digital Twin blueprint architecture and implementation platform. We build on the technological solutions developed in InterTwin (e.g. openEO, CWL and STAC) and fully embrace its inspiring principles of open science, reproducibility, and interoperability of data and methods.

How to cite: Ferrario, I. F., Castelli, M., Hussein, A. M., Liaqat, U. M., Weerts, A., and Jacob, A.: Seasonal forecasts of hydrological droughts over the Alps: advancing hybrid modelling applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6378, https://doi.org/10.5194/egusphere-egu24-6378, 2024.

EGU24-6656 | ECS | Orals | HS3.5

Exploring Catchment Regionalization through the Eyes of HydroLSTM 

Luis De La Fuente, Hoshin Gupta, and Laura Condon

Regionalization is an issue that hydrologists have been working on for decades. It is used, for example, when we transfer parameters from one calibrated model to another, or when we identify similarities between gauged to ungauged catchments. However, there is still no unified method that can successfully transfer parameters and identify similarities between different regions while accounting for differences in meteorological forcing, catchment attributes, and hydrological responses.

Machine learning (ML) has shown promising results in the generalization of its results at temporal and spatial scales for streamflow prediction. This suggests that ML models have learned useful regionalization relationships that we could extract. This study explores how the HydroLSTM representation, a modification of traditional Long Short-Term Memory, can learn meaningful relationships between meteorological forcing and catchment attributes. One promising feature of the HydroLSTM representation is that the learned patterns can generate different hydrological responses across the US. These findings indicate that we can learn more about regionalization by studying ML models.

How to cite: De La Fuente, L., Gupta, H., and Condon, L.: Exploring Catchment Regionalization through the Eyes of HydroLSTM, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6656, https://doi.org/10.5194/egusphere-egu24-6656, 2024.

EGU24-6965 | ECS | Posters on site | HS3.5

A Machine Learning Based Snow Cover Parameterization  for Common Land Model (CoLM)  

Han Zhang, Lu Li, and Yongjiu Dai

Accurate representation of snow cover fraction (SCF) is vital for terrestrial simulation, as it significantly affects surface albedo and land surface radiation. In land models, SCF is parameterized using snow water equivalent and snow depth. This study introduces a novel machine learning-based parameterization, which incorporates the light-GBM regression algorithm and additional input features: surface air temperature, humidity, leaf area index, and the standard deviation of topography. The regression model is trained with input features from the Common Land Model (CoLM) simulations and the labels from the Moderate Resolution Imaging Spectroradiometer (MODIS) observations on a daily scale. Offline verification indicates significant improvements for the new scheme over multiple traditional parameterizations.

Moreover, this machine learning-based parameterization has been online coupled with the CoLM using the Message Passing Interface (MPI). In online simulations, it substantially outperforms the widely used Niu and Yang (2007) scheme, improving the root mean square errors and temporal correlations of SCF on 80% of global grids. Additionally, associated land surface temperature and hydrological processes also benefit from the enhanced estimation of SCF. The new solution also shows good portability as it also demonstrates similar enhancements when it is directly used in a global 1° simulation, even though it was trained at a 0.1° resolution.

How to cite: Zhang, H., Li, L., and Dai, Y.: A Machine Learning Based Snow Cover Parameterization  for Common Land Model (CoLM) , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6965, https://doi.org/10.5194/egusphere-egu24-6965, 2024.

Land-atmosphere coupling (LAC) involves a variety of interactions between the land surface and the atmospheric boundary layer that are critical to are critical to understanding hydrological partitioning and cycling. As climate change continues to affect these interactions, identifying the specific drivers of LAC variability has become increasingly important. However, due to the complexity of the coupling mechanism, a quantitative understanding of the potential drivers is still lacking. Recently, deep learning has been considered as an effective approach to capture nonlinear relationships within the data, which provides a useful window into complex climatic processes. In this study, we will explore the LAC variability under climate change and its potential drivers by using Convolutional Long Short-term Memory (ConvLSTM) together with explainable AI techniques for attribution analysis. Specifically, the variability of the LAC, defined here as a two-legged index, is used as the modeling target, and variables representing meteorological forcing, land use, irrigation, soil properties, gross primary production, ecosystem respiration, and net ecosystem exchange are the inputs. Our analysis covers global land with a spatial resolution of 0.1° × 0.1° every one day during the period 1979–2019. Overall, the study demonstrates how interpretable machine learning would help us understand the complex dynamics of LAC under changing climatic conditions. We expect the results to facilitate the understanding of terrestrial hydroclimate interactions and hopefully provide multiple lines of evidence to support future water management.

How to cite: Huang, F., Shangguan, W., and Jiang, S.: Identifying potential drivers of land-atmosphere coupling variation under climate change by explainable artificial intelligence, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7202, https://doi.org/10.5194/egusphere-egu24-7202, 2024.

EGU24-7950 | ECS | Posters on site | HS3.5

Improving streamflow prediction across China by hydrological modelling together with machine learning 

wang jiao and zhang yongqiang

Predicting streamflow is key for water resource planning, flood and drought risk assessment, and pollution mitigation at regional, national, and global scales. There is a long-standing history of developing physically or conceptually catchment rainfall-runoff models that have been continuously refined over time to include more physical processes and enhance their spatial resolution. On the other hand, machine learning methods, particularly neural networks, have demonstrated exceptional accuracy and extrapolation capabilities in time-series prediction. Both approaches exhibit their strengths and limitations. This leads to a research question: how to effectively balance model complexity and physical interpretability while maintaining a certain level of predictive accuracy. This study aims to effectively combine a conceptual hydrological model, HBV, with machine learning (Transformer, Long Short-Term Memory (LSTM)) using a differentiable modeling framework strategy, tailored to predicting streamflow under diverse climatic and geographical conditions across China. Utilizing the Transformer to optimize and replace certain parameterization processes in the HBV model, a deep integration of neural networks and the HBV model is achieved. This integration not only captures the non-linear relationships that traditional hydrological models struggle to express, but also maintains the physical interpretability of the model. Preliminary application results show that the proposed framework outperforms traditional HBV model and pure LSTM model in streamflow prediction across 68 catchments in China. Based on the test results from different catchments, we have adjusted and optimized the model structure or parameters to better adapt to the unique hydrological processes of each catchment. The application of self-attention mechanisms and a differentiable programming framework significantly enhances the model's ability to capture spatiotemporal dynamics. It is likely that the proposed framework can be widely used for streamflow prediction somewhere else.

How to cite: jiao, W. and yongqiang, Z.: Improving streamflow prediction across China by hydrological modelling together with machine learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7950, https://doi.org/10.5194/egusphere-egu24-7950, 2024.

EGU24-9319 | ECS | Posters on site | HS3.5

Developing hybrid distributed models for hydrological simulation and climate change assessment in large alpine basins 

Bu Li, Ting Sun, Fuqiang Tian, and Guangheng Ni

Large alpine basins on the Tibetan Plateau (TP) provide abundant water resources crucial for hydropower generation, irrigation, and daily life. In recent decades, the TP has been significantly affected by climate change, making it crucial to understand the runoff response to climate change are essential for water resources management. While limited knowledge of specific alpine hydrological processes has constrained the accuracy of hydrological models and heightened uncertainties in climate change assessments. Recently, hybrid hydrological models have come to the forefront, synergizing the exceptional learning capacity of deep learning with a rigorous adherence to hydrological knowledge of process-based models. These models exhibit considerable promise in achieving precision in hydrological simulations and conducting climate change assessments. However, a notable limitation of existing hybrid models lies in their failure to incorporate spatial information and describe alpine hydrological processes, which restricts their applicability in hydrological modeling and climate change assessment in large alpine basins. To address this issue, we develop a set of hybrid distributed hydrological models by employing a distributed process-based model as the backbone, and utilizing embedded neural networks (ENNs) to parameterize and replace different internal modules. The proposed models are tested on three large alpine basins on the Tibetan Plateau. Results are compared to those obtained from hybrid lumped models, state-of-the-art distributed hydrological model, and DL models. A climate perturbation method is further used to evaluate the alpine basins' runoff response to climate change.Results indicate that proposed hybrid hydrological models can perform well in predicting runoff in large alpine basins. The optimal hybrid model with Nash-Sutcliffe efficiency coefficients (NSEs) higher than 0.87 shows comparable performance to state-of-the-art DL models. The hybrid distributed model also exhibits remarkable capability in simulating hydrological processes at ungauged sites within the basin, markedly surpassing traditional distributed models. Besides, runoff exhibits an amplification effect in response to precipitation changes, with a 10% precipitation change resulting in a 15–20% runoff change in large alpine basins. An increase in temperature enhances evaporation capacity and changes the redistribution of rainfall and snowfall and the timing of snowmelt, leading to a decrease in the total runoff and a reduction in the intra-annual variability of runoff. Overall, this study provides a high-performance tool enriched with explicit hydrological knowledge for hydrological prediction and improves our understanding about runoff’s response to climate change in large alpine basins on the TP. 

How to cite: Li, B., Sun, T., Tian, F., and Ni, G.: Developing hybrid distributed models for hydrological simulation and climate change assessment in large alpine basins, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9319, https://doi.org/10.5194/egusphere-egu24-9319, 2024.

In facing the challenges of limited observational streamflow data and climate change, accurate streamflow prediction and flood management in large-scale catchments become essential. This study introducing a time-lag informed deep learning framework to enhance streamflow simulation and flood forecasting. Using the Dulong-Irrawaddy River Basin (DIRB), a less-explored transboundary basin shared by Myanmar, China, and India, as a case study, we have identified peak flow lag days and relative flow scale. Integrating these with historical flow data, we developed an optimal model. The framework, informed by data from the upstream Hkamti sub-basin, significantly outperformed standard LSTM, achieving a Kling-Gupta Efficiency (KGE) of 0.891 and a Nash-Sutcliffe efficiency coefficient (NSE) of 0.904. Notably, the H_PFL model provides a valuable 15-day lead time for flood forecasting, enhancing emergency response preparations. The transfer learning model, incorporating meteorological inputs and catchment features, achieved an average NSE of 0.872 for streamflow prediction, surpassing the process-based model MIKE SHE's 0.655. We further analyzed the sensitivities of the deep learning model and process-based model to changes in meteorological inputs using different methods. Deep learning models exhibit complex sensitivities to these inputs, more accurately capturing non-linear relationships among multiple variables than the process-based model. Integrated Gradients (IG) analysis further demonstrates deep learning model's ability to discern spatial heterogeneity in upstream and downstream sub-basins and its adeptness in characterizing different flow regimes. This study underscores the potential of deep learning in enhancing the understanding of hydrological processes in large-scale catchments and highlights its value for water resource management in transboundary basins under data scarcity.

How to cite: Ma, K. and He, D.: Streamflow Prediction and Flood Forecasting with Time-Lag Informed Deep Learning framework in Large Transboundary Catchments, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9980, https://doi.org/10.5194/egusphere-egu24-9980, 2024.

EGU24-11159 | ECS | Orals | HS3.5

Uncovering the impact of hydrological connectivity on nitrate transport at the catchment scale using explainable AI 

Felipe Saavedra, Noemi Vergopolan, Andreas Musolff, Ralf Merz, Carolin Winter, and Larisa Tarasova

Nitrate contamination of water bodies is a major concern worldwide, as it poses a risk of eutrophication and biodiversity loss. Nitrate travels from agricultural land to streams through different hydrological pathways, which are abstrusely activated under different hydrological conditions. Certainly, hydrological conditions can alter the connection between different parts of the catchment and streams, in many cases independent of the discharge levels, leading to modifications in transport dynamics, retention, and nitrate removal rates in the catchment. While enhanced nitrate transport can be linked to high levels of hydrological connectivity, little is known about the effects of the spatial patterns of hydrological connectivity on the transport of nutrients at the catchment scale.

In this study, we combined daily stream nitrate concentration and discharge data at the outlet of 15 predominantly agricultural catchments in the United States (191–16,000 km2 area, 3500 km2 median area, and 77% median agriculture coverage) with soil moisture data from  SMAP-Hydroblocks (Vergopolan et al., 2021). SMAP-Hydroblocks is a hyperresolution soil moisture dataset at the top 5 cm of soil column at 30-m spatial resolution and 2-3 days revisit time (2015-2019), and it is derived through a combination of satellite data, land-surface and radiative transfer modeling, machine learning, and in-situ observations.

We configured a deep learning model for each catchment, driven by 2D soil moisture fields and 1D discharge time series, to evaluate the impact of streamflow magnitude and spatial patterns of soil moisture on streamflow nitrate concentration. The model setup comprises two parallel branches. The first branch incorporates a Long Short-term Memory (LSTM) model, the current state-of-the-art for time-series data modeling, utilizing daily discharge as input data. The second branch contains a Convolutional LSTM network (ConvLSTM) that incorporates daily soil moisture series, the fraction of agriculture of each pixel, and the height above the nearest drainage as a measurement of structural hydrological connectivity. Finally, a fully connected neural network combines the outputs of the two branches to predict the time series of nitrate concentration at the catchment outlet.

Preliminary results indicate that the model performs satisfactorily in one-third of the catchments, with Nash-Sutcliffe Efficiency (NSE) values above 0.3 for the test period, which covers the final 25% of the time series, and this is achieved without tuning the hyperparameters. The model failed to simulate nitrate concentrations (resulting in negative NSE values) typically in larger catchments. Using these simulations and explainable AI, we will quantify the importance of different inputs, in particular, we tested the relative importance of soil moisture for simulating nitrate concentrations. While the literature shows most of the predictive power for nitrate comes from streamflow rates, we show how soil moisture fields add value to the prediction and understanding of hydrologic connectivity. Finally, we will fine-tune the model for each catchment and include more predictors to enhance the reliability of model simulations.

How to cite: Saavedra, F., Vergopolan, N., Musolff, A., Merz, R., Winter, C., and Tarasova, L.: Uncovering the impact of hydrological connectivity on nitrate transport at the catchment scale using explainable AI, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11159, https://doi.org/10.5194/egusphere-egu24-11159, 2024.

EGU24-11778 | ECS | Orals | HS3.5

How much data is needed for hydrological modeling?  

Bjarte Beil-Myhre, Bernt Viggo Matheussen, and Rajeev Shrestha

Hydrological modeling has undergone a transformative decade, primarily catalyzed by the groundbreaking data-driven approach introduced by F. Kratzert et al. (2018) utilizing LSTM networks (Hochreiter & Schmidhuber, 1997). These networks leverage extensive datasets and intricate model structures, outshining traditional hydrological models, albeit with the caveat of being computationally intensive during training. This prompts a critical inquiry into the requisite volume and complexity of data for constructing a dependable and resilient hydrological model.


In this study, we employ a hybrid model that amalgamates the strengths of classical hydrological models with the data-driven approach. These modified models are derived from the LSTM models developed by F. Kratzert and team, in conjunction with classical hydrological models such as the Statkraft Hydrology Forecasting Toolbox (SHyFT) from Statkraft and the Distributed Regression Hydrological Model (DRM) by Matheussen at Å Energi. The models were applied to sixty-five catchments in southern Norway, each characterized by diverse features and data records. Our analysis assesses the performance of these models under various scenarios of data availability, considering factors such as:


- Varying numbers of catchments selected based on size or location.
- The duration of the data records utilized for model calibration.
- Specific catchment characteristics and outputs from classical models employed as inputs 
(e.g., area, latitude, longitude, or additional variables).


Preliminary findings indicate that model inputs can be significantly stripped down without compromising model performance. With a limited set of catchment characteristics, the performance approaches that of the model with all characteristics, mitigating added uncertainty and model complexity. Additionally, increasing the length of data records enhances model performance, albeit with diminishing returns. Furthermore, our study reveals that augmenting catchments in the model does not necessarily yield a commensurate improvement in overall model performance. These insights contribute to refining our understanding of the interplay between data, model complexity, and performance in hydrological modeling.


The novelty in this research is that the hybrid models can be applied in a relatively small area, with few catchments and a limited number of climate stations and catchment characteristics compared to the CAMELS setup, used by Kratzert and still achieve improved results. 

How to cite: Beil-Myhre, B., Matheussen, B. V., and Shrestha, R.: How much data is needed for hydrological modeling? , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11778, https://doi.org/10.5194/egusphere-egu24-11778, 2024.

EGU24-12068 | ECS | Orals | HS3.5

Hybrid Neural Hydrology: Integrating Physical and Machine Learning Models for Enhanced Predictions in Ungauged Basins 

Rajeev Shrestha, Bjarte Beil-Myhre, and Bernt Viggo Matheussen

Accurate prediction of streamflow in ungauged basins is a fundamental challenge in hydrology. The lack of hydrological observations and the inherent complexities in ungauged regions hinder accurate predictions, posing significant hurdles for water resource management and forecasting. Over time, efforts have been made to tackle this predicament, primarily utilizing physical hydrological models. However, these models need to be revised due to their reliance on site-specific data and their struggle to capture complex nonlinear relationships. Recent work by Kratzert et al. (2018) suggests that nonlinear regression models such as LSTM neural networks (Hochreiter & Schmidhuber, 1997) may outperform traditional physically based models. The authors demonstrate the application of LSTM models to ungauged prediction problems, noting that information about physical processes might not have been fully utilized in the modeling setup.

In response to these challenges, this research explores a novel approach by introducing a Hybrid Neural Hydrology (HNH) approach by fusing the strengths of physical hydrological models like Statkraft Hydrology Forecasting Toolbox (SHyFT), developed at Statkraft and the Distributed Regression Hydrological Model (DRM), developed by Matheussen at Å Energi with machine learning model, specifically Neural Hydrology, developed by F. Kratzert and team. By combining the information and structural insights of physically based models with the flexibility and adaptability of machine learning models, HNH seeks to leverage the complementary attributes of these methodologies. The combination is achieved by fusing the uncalibrated physical model with an LSTM based model. This hybridization seeks to enhance the model's adaptability and learning capabilities, leveraging available information from various sources to improve predictions in ungauged areas. Furthermore, this research investigates the impact of clustering catchments based on area to improve model performance.

The data used in this research includes dynamic variables such as precipitation, air temperature, wind speed, relative humidity, and observed streamflow obtained from sources such as the internal database at Å Energi, The Norwegian Water Resources and Energy Directorate (NVE), The Norwegian Meteorological Institute (MET), ECMWF (ERA5) and static attributes such as catchment size, mean elevation, forest fraction, lake fraction and reservoir fraction obtained from CORINE Land Cover and Høydedata (www.hoydedata.no).

This study presents HNH as a novel approach that synergistically integrates the structural insights of physical models with the adaptability of machine learning. Preliminary findings indicate promising outcomes from testing in 65 catchments in southern Norway. This suggests that information about physical processes and clustering catchments based on their similarities significantly improves the prediction quality in ungauged regions. This discovery underscores the potential of using hybrid models and clustering techniques to enhance the performance of predictive models in ungauged basins.

How to cite: Shrestha, R., Beil-Myhre, B., and Matheussen, B. V.: Hybrid Neural Hydrology: Integrating Physical and Machine Learning Models for Enhanced Predictions in Ungauged Basins, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12068, https://doi.org/10.5194/egusphere-egu24-12068, 2024.

EGU24-12574 | ECS | Orals | HS3.5 | Highlight

Analyzing the performance and interpretability of hybrid hydrological models 

Eduardo Acuna, Ralf Loritz, Manuel Alvarez, Frederik Kratzert, Daniel Klotz, Martin Gauch, Nicole Bauerle, and Uwe Ehret

Hydrological hybrid models have been proposed as an option to combine the enhanced performance of deep learning methods with the interpretability of process-based models. Among the various hybrid methods available, the dynamic parameterization of conceptual models using LSTM networks has shown high potential. 

In this contribution, we extend our previous related work (Acuna Espinoza et al., 2023) by asking the questions: How well can hybrid models predict untrained variables, and how well do they generalize? We address the first question by comparing the internal states of the model against external data, specifically against soil moisture data obtained from ERA5-Land for 60 basins in Great Britain. We show that the process-based layer can reproduce the soil moisture dynamics with a correlation of 0.83, which indicates a good ability of this type of model to predict untrained variables. Moreover, we compare this method against existing alternatives used to extract non-target variables from purely data-driven methods (Lees et al., 2022), and discuss the differences in philosophy, performance, and implementation. Then, we address the second question by evaluating the capacity of such models to predict extreme events. Following the procedure proposed by Frame et al (2022), we train the hybrid models in low-flow regimes and test them in high-flow situations to evaluate the generalization capacity of such models and compare them against results from purely data-driven methods. Both experiments are done using large-sample data from the CAMELS-US and CAMELS-GB dataset.

With these new experiments, we contribute to answering the question of whether hybrid models give an actual advantage over purely data-driven techniques or not.

References

Acuna Espinoza, E., Loritz, R., Alvarez Chaves, M., Bäuerle, N., & Ehret, U.: To bucket or not to bucket? Analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization. EGUsphere, 1–22. https://doi.org/10.5194/egusphere-2023-1980, 2023.

Frame, J. M. and Kratzert, F. and Klotz, D. and Gauch, M. and Shalev, G. and Gilon, O. and Qualls, L. M. and Gupta, H. V. and Nearing, G. S., :Deep learning rainfall--runoff predictions of extreme events, Hydrology and Earth System Sciences, 26 ,3377-3392, https://doi.org/10.5194/hess-26-3377-2022, 2022

Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrology and Earth System Sciences, 26, 3079–3101, https://doi.org/10.5194/hess-26-3079-2022,  2022.

How to cite: Acuna, E., Loritz, R., Alvarez, M., Kratzert, F., Klotz, D., Gauch, M., Bauerle, N., and Ehret, U.: Analyzing the performance and interpretability of hybrid hydrological models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12574, https://doi.org/10.5194/egusphere-egu24-12574, 2024.

EGU24-12981 | ECS | Orals | HS3.5

Using Temporal Fusion Transformer (TFT) to enhance sub-seasonal drought predictions in the European Alps 

Annie Yuan-Yuan Chang, Konrad Bogner, Maria-Helena Ramos, Shaun Harrigan, Daniela I.V. Domeisen, and Massimiliano Zappa

In recent years, the European Alpine space has witnessed unprecedented low-flow conditions and drought events, affecting various economic sectors reliant on sufficient water availability, including hydropower production, navigation and transportation, agriculture, and tourism. As a result, there is an increasing need for decision-makers to have early warnings tailored to local low-flow conditions.

The EU Copernicus Emergency Management Service (CEMS) European Flood Awareness System (EFAS) has been instrumental in providing flood risk assessments across Europe with up to 15 days of lead time since 2012. Expanding its capabilities, the EFAS also generates long-range hydrological outlooks from sub-seasonal to seasonal horizons. Despite its original flood-centric design, previous investigations have revealed EFAS’s potential for simulating low-flow events. Building upon this finding, this study aims to leverage EFAS's anticipation capability to enhance the predictability of drought events in Alpine catchments, while providing support to trans-national operational services.

In this study, we integrate the 46-day extended-range EFAS forecasts into a hybrid setup for 106 catchments in the European Alps. Many studies have demonstrated Long Short-Term Memory (LSTM)’s capacity to produce skillful hydrological forecasts at various time scales. Here we employ the deep learning algorithm Temporal Fusion Transformer (TFT), an algorithm that combines aspects of LSTM networks with the Transformer architecture. The Transformer's attention mechanisms can focus on relevant time steps across longer sequences enabling TFT to capture both local temporal patterns as well as global dependencies. The role of the TFT is to improve the accuracy of low-flow predictions and to understand their spatio-temporal evolution. In addition to EFAS data, we incorporate features such as European weather regime data, streamflow climatology, and hydropower proxies. We also consider catchment characteristic information including glacier coverage and lake proximity. By incorporating its various attention mechanisms, makes TFT a more explainable algorithm than LSTMs, which helps us understand the driving factor for the forecast skill. Our evaluation uses EFAS re-forecast data as the benchmark and measures the reliability of ensemble forecasts using metrics like the Continuous Ranked Probability Skill Score (CRPSS).

Preliminary results show that a hybrid approach using the TFT algorithm can reduce the flashiness of EFAS during drought periods in some catchments, thereby improving drought predictability. Our findings will contribute to evaluating the potential of these forecasts for providing valuable information for skillful early warnings and assist in informing regional and local water resource management efforts in their decision-making.

How to cite: Chang, A. Y.-Y., Bogner, K., Ramos, M.-H., Harrigan, S., Domeisen, D. I. V., and Zappa, M.: Using Temporal Fusion Transformer (TFT) to enhance sub-seasonal drought predictions in the European Alps, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12981, https://doi.org/10.5194/egusphere-egu24-12981, 2024.

EGU24-13417 | ECS | Orals | HS3.5

Evaluating physics-based representations of hydrological systems through hybrid models and information theory 

Manuel Álvarez Chaves, Eduardo Acuña Espinoza, Uwe Ehret, and Anneli Guthke

Hydrological models play a crucial role in understanding and predicting streamflow. Recently, hybrid models, combining both physical principles and data-driven approaches, have emerged as promising tools to extract insights into system functioning and increases in model predictive skill which are beyond traditional models.

However, the study by Acuña Espinoza et al. (2023) has raised the question whether the flexible data-driven component in a hybrid model might "overwrite" the interpretability of its physics-based counterpart. On the example of conceptual hydrological models with dynamic parameters tuned by LSTM networks, they showed that even in a case where the physics-based representation of the hydrological system is chosen to be nonsensical on purpose, the addition of the flexible data-driven component can lead to a well-performing hybrid model. This compensatory behavior highlights the need for a thorough evaluation of physics-based representations in hybrid hydrological models, i.e., hybrid models should be inspected carefully to understand why and how they predict (so well).

In this work, we provide a method to support this inspection: we objectively assess and quantify the contribution of the data-driven component to the overall hybrid model performance. Using information theory and the UNITE toolbox (https://github.com/manuel-alvarez-chaves/unite_toolbox), we measure the entropy of the (hidden) state-space in which the data-driven component of the hybrid model moves. High entropy in this setting means that the LSTM is doing a lot of "compensatory work", and hence alludes to an inadequate representation of the hydrological system in the physics-based component of the hybrid model. By comparing this measure among a set of alternative hybrid models with different physics-based representations, an order in the degree of realism of the considered representations can be established. This is very helpful for model evaluation and improvement as well as system understanding.

To illustrate our findings, we present examples from a synthetic case study where a true model does exist. Subsequently, we validate our approach in the context of regional predictions using CAMELS-GB data. This analysis highlights the importance of using diverse representations within hybrid models to ensure the pursuit of "the right answers for the right reasons". Ultimately, our work seeks to contribute to the advancement of hybrid modeling strategies that yield reliable and physically reasonable insights into hydrological systems.

References

  • Acuña Espinoza, E., Loritz, R., Álvarez Chaves, M., Bäuerle, N., & Ehret, U. (2023). To bucket or not to bucket? analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization. EGUsphere, 1–22. https://doi.org/10.5194/egusphere-2023-1980

How to cite: Álvarez Chaves, M., Acuña Espinoza, E., Ehret, U., and Guthke, A.: Evaluating physics-based representations of hydrological systems through hybrid models and information theory, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13417, https://doi.org/10.5194/egusphere-egu24-13417, 2024.

EGU24-14280 | Orals | HS3.5

Quantifying Evapotranspiration and Gross Primary Productivity Across Europe Using Radiative Transfer Process-Guided Machine Learning 

Sheng Wang, Rui Zhou, Egor Prikaziuk, Kaiyu Guan, René Gislum, Christiaan van der Tol, Rasmus Fensholt, Klaus Butterbach-Bahl, Andreas Ibrom, and Jørgen Eivind Olesen

Accurately quantifying water and carbon fluxes between terrestrial ecosystems and the atmosphere is highly valuable for understanding ecosystem biogeochemical processes for climate change mitigation and ecosystem management. Remote sensing can provide high spatial and temporal resolution reflectance data of terrestrial ecosystems to support quantifying evapotranspiration (ET) and gross primary productivity (GPP).  Conventional remote sensing-based ET and GPP algorithms are either based on empirical data-driven approaches or process-based models. Empirical data-driven approaches often have high accuracy for cases within the source data domain, but lack the links to a mechanistic understanding of ecosystem processes. Meanwhile, process-based models have high generalizability with incorporating physically based soil-vegetation radiative transfer processes, but usually have lower accuracy. To integrate the strengths of data-driven and process-based approaches, this study developed a radiative transfer process-guided machine learning approach (PGML) to quantify ET and GPP across Europe. Specifically, we used the Soil Canopy Observation, Photochemistry, and Energy fluxes (SCOPE, van der Tol et al. 2009) radiative transfer model to generate synthetic datasets and developed a pre-trained neural network model to quantify ET and GPP. Furthermore, we utilized field measurements from 63 eddy covariance tower sites from 2016 to 2020 across Europe to fine-tune the neural networks with incorporating physical laws into the cost function. Results show that PGML can significantly improve the SCOPE simulations of net radiation (R2 from 0.91 to 0.96), sensible heat fluxes (R2 from 0.43 to 0.77), ET (R2 from 0.61 to 0.78), and GPP (R2 from 0.72 to 0.78) compared to eddy covariance observations. This study highlights the potential of PGML to integrate machine learning and radiative transfer models to improve the accuracy of land surface flux estimates for terrestrial ecosystems.

How to cite: Wang, S., Zhou, R., Prikaziuk, E., Guan, K., Gislum, R., van der Tol, C., Fensholt, R., Butterbach-Bahl, K., Ibrom, A., and Olesen, J. E.: Quantifying Evapotranspiration and Gross Primary Productivity Across Europe Using Radiative Transfer Process-Guided Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14280, https://doi.org/10.5194/egusphere-egu24-14280, 2024.

Deep learning models for streamflow prediction have been widely used but are often considered as "black boxes" due to their lack of interpretability. To address this issue, the field has recently focused on Explainable Artificial Intelligence (XAI) methods to improve the transparency of these models. In this study, we aimed to investigate the influence of precipitation uncertainty on data-driven modeling and elucidate the hydrological significance of deep learning streamflow modeling in both temporal and spatial dimensions by Explainable Artificial Intelligence techniques. To achieve this, an LSTM model for time series prediction and a CNN-LSTM model for fusion spatial-temporal information are proposed. These models are driven by five sets of reanalyzed datasets. The contribution of precipitation before peak flow to runoff simulation is quantified, in order to identify the most important processes in runoff generation for each river basin. In addition, visualization techniques are employed to analyze the relationship between the weights of the convolutional layers in our models and the distribution of precipitation features. By doing so, we aimed to gain insights into the underlying mechanisms of the models' predictions.

The results of our study revealed several key findings. In the high-altitude areas of the Yangtze River's upper reaches, we found that snowmelt runoff, historical precipitation, and recent precipitation were the combined causes for floods. In the middle reach of the Yangtze River, floods were induced by the combined effect of historical and recent precipitation, except for the Ganjiang River, where historical precipitation events played a major role in controlling flood events. Through the visualization of convolutional layers, we discovered that areas with high convolutional layer weights had a greater impact on the model's predictions. We also observed a high similarity between the weight distribution of the convolutional layers and the spatial distribution of multi-year average precipitation in the upper reach river basins. In the middle reach, the weight distribution of the model's convolutional layers showed a strong correlation with the monthly maximum precipitation in the basin. Overall, this study provides valuable insights into the potential of deep learning models for streamflow prediction and enhances our understanding of the impacts of precipitation in the Yangtze River Basin.

How to cite: Tian, Y., Tan, W., and Yuan, X.: Revealing the key factors and uncertainties in data-driven hydrological prediction using Explainable Artificial Intelligence techniques, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14666, https://doi.org/10.5194/egusphere-egu24-14666, 2024.

EGU24-16235 | ECS | Orals | HS3.5

Flow estimation from observed water levels using differentiable modeling for low-lying rivers affected by vegetation and backwater 

Phillip Aarestrup, Jonas Wied Pedersen, Michael Brian Butts, Peter Bauer-Gottwein, and Roland Löwe

Simulations of river flows and water levels are crucial for flood predictions and water resources management. Water levels are easy to observe using sensors, while the mapping between water levels and flows in rivers is usually derived from rating curves. However, rating curves frequently do not include geometry, backwater effects, and/or seasonal variations, which can limit their applicability – especially in stream systems that are affected by seasonal vegetation and backwater effects. To address this, we propose a differentiable model that merges a neural network with a physically based, steady-state implementation of the Saint-Venant equations. 

In the setup, the neural network is trained to predict seasonal variations caused by vegetation growth in Manning’s roughness based on inputs of meteorological forcing and time, while the physical model is responsible for converting flow estimates into water levels along the river channel. The framework efficiently estimates model parameters by tracking gradients through both the physical model and the neural network via backpropagation. This allows us to calibrate parameters for both the runoff and the Manning’s roughness from measured water levels, thus overcoming rating curve limitations while accounting for backwater, river geometry, and seasonal variations in roughness. 

We tested the model on a 20 km stretch of the Vejle River, Denmark, which is both heavily vegetated and affected by backwater from the coast. The model was trained across five water level sensors using two years of data (2020-2022). When evaluated against 10 years of observed flow measurements (2007-2017), the model demonstrated a Mean Absolute Relative Error (MARE) of 10% compared to manually gauged discharge observations. This is comparable to the estimated uncertainty of 10% in the discharge measurements.  

The framework enables a calibration of dynamic Manning roughness within a few hours, and therefore offers a scalable solution for estimating river flows from water levels when cross-section information is available. Potential applications span across many disciplines in water resource management. 

How to cite: Aarestrup, P., Pedersen, J. W., Butts, M. B., Bauer-Gottwein, P., and Löwe, R.: Flow estimation from observed water levels using differentiable modeling for low-lying rivers affected by vegetation and backwater, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16235, https://doi.org/10.5194/egusphere-egu24-16235, 2024.

EGU24-17842 | ECS | Orals | HS3.5 | Highlight

Deep learning based differentiable/hybrid modelling of the global hydrological cycle 

Zavud Baghirov, Basil Kraft, Martin Jung, Marco Körner, and Markus Reichstein

The integration of machine learning (ML) and process based modeling (PB) in so-called hybrid models, also known as differentiable modelling, has recently gained popularity in the geoscientific community (Reichstein et al. 2019; Shen et al. 2023). The approach aims to address limitations in both ML (data adaptive but difficult to interpret and physically inconsistent) and PB (physically consistent and interpretable but biased). It holds significant potential for studying uncertain processes in the global water cycle (Kraft et al. 2022).

In this work, we developed a differentiable/hybrid model of the global hydrological cycle by fusing deep learning with a custom PB model. The model inputs include air temperature, precipitation, net radiation as dynamic forcings, and static features like soil texture as input to a long short-term memory (LSTM) model. The LSTM represents the uncertain and less understood spatio-temporal parameters which are directly used in a conceptual hydrological model. Simultaneously, we use fully connected neural networks (FCNN) to represent the uncertain spatial parameters. In the hydrological model we represent key water fluxes (e.g. transpiration, evapotranspiration (ET), runoff) and storages (snow, soil moisture and groundwater). The model is constrained against the observation-based data, like terrestrial water storage (TWS) anomalies (GRACE), fAPAR (MODIS) and snow water equivalent (GLOBSNOW).

Building upon previous work (Kraft et al. 2022), we improved the representations of key hydrological processes. We now explicitly estimate vegetation state that is directly used to partition ET into transpiration, soil and interception evaporation. We also estimate rooting-zone water storage capacity—a key hydrological parameter that is still highly uncertain. To asses the robustness of the estimated parameters, we quantify equifinality by training multiple models with random weight initialisation in a 10-fold cross validation setup.

The model learns reasonable spatial and spatio-temporal patterns of critical, yet uncertain, hydrological parameters as latent variables. For example, we assess and show that the estimations of global spatial patterns on rooting-zone water storage capacity and transpiration versus ET are plausible. Equifinality quantification indicates that the dynamic patterns of the modelled water storages are robust, while there is a large uncertainty in the mean of soil moisture and TWS.

References

Kraft, Basil, Martin Jung, Marco Körner, Sujan Koirala, and Markus Reichstein. 2022. “Towards Hybrid Modeling of the Global Hydrological Cycle.” Hydrology and Earth System Sciences 26 (6): 1579–1614.

Reichstein, Markus, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais, et al. 2019. “Deep Learning and Process Understanding for Data-Driven Earth System Science.” Nature 566 (7743): 195–204.

Shen, Chaopeng, Alison P Appling, Pierre Gentine, Toshiyuki Bandai, Hoshin Gupta, Alexandre Tartakovsky, Marco Baity-Jesi, et al. 2023. “Differentiable Modelling to Unify Machine Learning and Physical Models for Geosciences.” Nature Reviews Earth & Environment 4 (8): 552–67.

How to cite: Baghirov, Z., Kraft, B., Jung, M., Körner, M., and Reichstein, M.: Deep learning based differentiable/hybrid modelling of the global hydrological cycle, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17842, https://doi.org/10.5194/egusphere-egu24-17842, 2024.

EGU24-20112 | ECS | Posters on site | HS3.5

Data-driven global projection of future flooding in 18.5 million river reaches 

Boen Zhang, Louise Slater, Simon Moulds, Michel Wortmann, Yinxue Liu, Jiabo Yin, and Xihui Gu

Reliable flood projection is crucial for designing suitable flood protection structures and for enhancing resilience in vulnerable regions. However, projections of future flooding suffer from cascading uncertainties arising from the climate model outputs, emission scenarios, hydrological models, and the shortage of observations in data-sparse regions. To overcome these limitations, we design a new hybrid model, blending machine learning and climate model simulations, for global-scale projection of river flooding. This is achieved by training a random forest model directly on climate simulations from 20 CMIP6 models over the historical period (1985−2014), with extreme discharges observed at approximately 15,000 hydrologic stations as the target variable. The random forest model also includes static geographic predictors including land cover, climate, geomorphology, soil, human impacts, and hydrologic signatures. We make the explicit assumption that the random forest model can ‘learn’ systematic biases in the relationship between the climate simulations and flood regimes in different regions of the globe. We then apply the well-calibrated random forest model to a new vector-based, global river network in approximately 18.51 million reaches with drainage areas greater than 100 km2. Global changes in flood hazard are projected for the 21st century (2015−2100) under SSP2-4.5 and SSP5-8.5. We show that the data-driven method reproduces historical annual maximum discharges better than the physically-based hydrological models driven by bias-corrected climate simulations in the ISIMIP3b experiment. We then use the machine learning model with explainable AI to diagnose spatial biases in the climate simulations and future flood projections in different regions of the globe.

How to cite: Zhang, B., Slater, L., Moulds, S., Wortmann, M., Liu, Y., Yin, J., and Gu, X.: Data-driven global projection of future flooding in 18.5 million river reaches, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20112, https://doi.org/10.5194/egusphere-egu24-20112, 2024.

Machine learning has long been restricted by the mystery of its black box, especially in the fields like geosciences that emphasizes clear expressions of mechanisms. To deal with that issue, we provided a fundamental framework combining two branches, clusters and regressions in machine learning, specifically, spectral clustering in unsupervised clustering methods and artificial neural networks in regression models, to resemble calculations in process-based models. With a case study of evapotranspiration, it was demonstrated that our framework was not only able to discern two processes, aerodynamics and energy, similar to the process-based model, i.e., Penman-Monteith formula, but also provided a third space for potential underrepresented process from canopy or ecosystems. Meanwhile, with only a few hundred of training data in most sites, the simulation of evapotranspiration achieved a higher accuracy (R2 of 0.92 and 0.82; RMSE of 12.41W/m2 and 8.11 W/m2 in training set and test set respectively) than commonly used machine learning approaches, like artificial neural networks in a scale of 100,000 training set (R2 of 0.85 and 0.81; RMSE of 42.33W/m2 and 46.73 W/m2). In summary, our method provides a new direction of hybridizing machine learning approaches and mechanisms for future work, which is able to tell mechanisms from a little amount of data, and thus could be utilized in validating the known and even exploring the unknown knowledge by providing reference before experiments and mathematical derivations.

How to cite: Hu, Y. and Jiang, Y.: Interpretably reconstruct physical processes with combined machine learning approaches, a case study of evapotranspiration, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20579, https://doi.org/10.5194/egusphere-egu24-20579, 2024.

EGU24-20602 | ECS | Orals | HS3.5

Enhanced Continental Runoff Prediction through Differentiable Muskingum-Cunge Routing (δMC-CONUS-hydroDL2) 

Tadd Bindas, Yalan Song, Jeremy Rapp, Kathryn Lawson, and Chaopeng Shen

Recent advancements in flow routing models have enabled learning from big data using differentiable modeling techniques. However, their application remains constrained to smaller basins due to limitations in computational memory and hydrofabric scaling. We propose a novel methodology to scale differentiable river routing from watershed (HUC10) to continental scales using the δMC-CONUS-hydroDL2 model. Mimicking the Muskingum-Cunge routing model, this approach aims to enhance flood wave timing prediction and Manning’s n parameter learning across extensive areas. We employ the δHBV-HydroDL model, trained on the 3000 GAGES-II dataset, for streamflow predictions across CONUS HUC10 basins. These predictions are then integrated with MERIT basin data and processed through our differentiable routing model, which learns reach-scale parameters like Manning’s n and spatial channel coefficient q via an embedded neural network. This approach enhances national-scale flood simulations by leveraging a learned Manning’s n parameterization, directly contributing to the refinement of CONUS-scale flood modeling. Furthermore, this method shows promise for global application, contingent upon the availability of streamflow predictions and MERIT basin data. Our methodology represents a significant step forward in the spatial scaling of differentiable river routing models, paving the way for more accurate and extensive flood simulation capabilities.

How to cite: Bindas, T., Song, Y., Rapp, J., Lawson, K., and Shen, C.: Enhanced Continental Runoff Prediction through Differentiable Muskingum-Cunge Routing (δMC-CONUS-hydroDL2), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20602, https://doi.org/10.5194/egusphere-egu24-20602, 2024.

EGU24-21725 | ECS | Posters on site | HS3.5

Machine Learning Insights into Aquifer Recharge: Site suitability analysis in season water availability scenarios 

Valdrich Fernandes, Perry de Louw, Coen Ritsema, and Ruud Bartholomeus

Groundwater models are valuable tools for optimising decisions that influence groundwater flow. Spatially distributed models represent groundwater levels across the entire area from where essential information can be extracted, directly aiding in the decision-making process. In our previous study, we explored different machine learning (ML) models as faster alternatives to predict the increase in stationary groundwater head due to artificial recharge in unconfined aquifers while considering a wider spatial extent (832 columns x 1472 rows, totalling 765 km2) than previous ML groundwater models. The trained ML model accurately estimates the increase in groundwater head within 0.24 seconds, achieving a Nash-Sutcliffe efficiency of 0.95. This allows quick analysis of site suitability at potential recharge rates. This study extends the approach to incorporate seasonal variation in water availability, illustrating the concept of storing excess water during winter to meet heightened demands during summer, when water availability is minimal. Additionally, we quantify the impacts of the local properties, geohydrological and surface water network properties, on the storage capacity by training ML models on estimating the summer decay rate of stored water in hypothetical aquifer recharge sites.  

Among 720 hypothetical recharge sites, we vary its location, recharge rate and size to capture various combinations of local properties in the catchment. Artificial recharge is modeled using a MODFLOW-based groundwater model, representing the geo-hydrological properties and the surface water network in the Baakse Beek catchment in the Netherlands. The recharge is simulated from October 2011 till February 2012 with the remainder of the year simulated without any artificial recharge. Based on the modeled heads, the decay rate of stored water is calculated for the period until October. This calculated decay rate, in combination with the local properties are used to train and evaluate the ML model. The relative contributions of properties to the decay rate are quantified using the latest developments in explainable AI techniques. Techniques such as permutation importance and Ceteris paribus profiles not only help categorize the suitability of potential recharge sites but also quantify the relative contribution of each property. By leveraging these insights, water managers can make informed decisions regarding site improvement measures. 

How to cite: Fernandes, V., de Louw, P., Ritsema, C., and Bartholomeus, R.: Machine Learning Insights into Aquifer Recharge: Site suitability analysis in season water availability scenarios, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21725, https://doi.org/10.5194/egusphere-egu24-21725, 2024.

The escalating threats of climate change, compounded by seismicity along the ring of fire, pose significant challenges to the Pacific Island Countries (PICs), making them particularly susceptible to the impacts of natural hazards. This commentary explores the potential of Artificial Intelligence (AI) and satellite technology in enhancing resilience, focusing on their application in early warning systems and response/recovery for these vulnerable regions. The integration of these digital technologies can revolutionize the way PICs predict, respond to, and recover from climate- and seismic-induced catastrophes, thereby strengthening their resilience. It also discusses the future prospects for AI and satellite technology in PICs and concludes by highlighting the importance of international cooperation to ensure that PICs can benefit from these technologies.

How to cite: Kim, J. and Shm fakhruddin, B.: Empowering Pacific Island Countries against Climate Change: The Role of AI and Satellite Technology, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16, https://doi.org/10.5194/egusphere-egu24-16, 2024.

EGU24-1408 | ECS | Orals | NH6.7 | Highlight

The Need for Utilizing AI in Locating Trapped Victims Following Earthquakes and Floods 

Ronnen Avny and Menachem Friedman

The present article delves into the necessity of employing artificial intelligence (AI) in locating individuals trapped during natural disasters such as earthquakes and floods. By utilizing unmanned equipment for reconnaissance and support tasks during search and rescue missions, lives can be saved, and the process expedited. Natural disasters have resulted in significant financial losses and loss of human lives, making it imperative to develop efficient and effective methods for rescue operations. The article emphasizes the benefits of using Trapped Victims Location (TVL) systems, including improved response times, increased accuracy, enhanced situational awareness, and improved safety for first responders. Furthermore, the article discusses the current TVL technologies available, such as visual cameras, acoustic sensors, thermal imaging cameras, radar sensors, GPRS, Cellular receivers, and more. The article also highlights the operational gaps within first responders' systems for locating trapped victims and discusses the specific operational needs for various scenarios. This work serves as a basis for further scientific and engineering projects that can overcome existing gaps and enhance the operational process of locating victims during emergencies, significantly improving accuracy and the likelihood of locating live individuals while expediting the entire procedure.

How to cite: Avny, R. and Friedman, M.: The Need for Utilizing AI in Locating Trapped Victims Following Earthquakes and Floods, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1408, https://doi.org/10.5194/egusphere-egu24-1408, 2024.

Xiaojin County, Sichuan Province, China was selected as the study area of this paper, and twelve conditioning factors were determined according to the literature review. The spatial correlation between landslide and conditioning factors is analyzed using the weights-of-evidence (WoE) model, and the landslide susceptibility in Xiaojin county is predicted. The landslide susceptibility in this region was mainly assessment by WoE based random forest (RF) model. The radial basis function network (RBFNetwork) model was also exploited to map landslide susceptibility with the identical datasets. Finally, the landslide susceptibility maps were produced, and the comprehensive performance of the three models was quantitatively evaluated and compared by the receiver operating characteristic (ROC) curves and area under curve (AUC) values. The results show that the three models are suitable for landslide susceptibility evaluation in the study area, and the evaluation effect of the WoE model is better than that of the RF and RBF network models. More concretely, the goodness-of-fit values of the WoE, RF and RBFNetwork models in the training dataset are 0.899, 0.880 and 0.866, respectively. In terms of prediction accuracy, AUC values are 0.892, 0.874 and 0.863 respectively. Additionally, mean decrease accuracy (MDA) and means decrease Gini (MDG) are used to quantify the importance of landslide conditioning factors. Elevation, soil, distance to roads and distance to rivers are considered as the most important conditioning factors in landslide susceptibility modeling. Consequently, the study achievements in this paper have reference significance on the development and exploitation of land resources in Xiaojin County.

How to cite: Zhao, X. and Chen, W.: Landslide susceptibility modeling using data-driven weights-of-evidence based random forest and radial basis function network, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7006, https://doi.org/10.5194/egusphere-egu24-7006, 2024.

The world has experienced the profound and devastating consequences of floods on human life, prompting a shift from mere academic examination to a critical socio-political imperative. To initiate effective flood risk management, many nations are working on creating user-friendly tools to identify flood-prone areas across extensive watersheds. Recently, Geomorphic Flood Descriptors (GFDs), which rely on the characteristics of the river drainage and are computationally less demanding, have been used as an efficient alternative to complex hydraulic models. However, validating the flood inundation maps from GFDs remains a major challenge, especially for ungauged watersheds that limit the adoption of data-intensive hydraulic modeling. In addition, as weather patterns and climate variations incur significant heterogeneity in flood patterns over large watersheds, we need to find error-free benchmark maps to validate the GFDs. The present study explores the suitability of Ensemble Machine Learning (ML) models to represent flooding at high resolution over large ungauged watersheds, thus paving the major research gap of authenticating the GFD-derived flood map with ground truth in ungauged basins. A suite of about 25 flood-influencing factors incorporating geomorphological, climatological, and soil parameters such as the Geomorphic Flood Index (GFI), Topographic Wetness Index (TWI), Height Above the Nearest Drainage (HAND), Slope, Stream Power Index (SPI), rainfall, soil type, and horizontal distance from the stream, etc., were derived from a high-resolution DEM (CartoDEM, resolution~30m). The two most prominent tree-based machine learning (ML) techniques, Random Forest (RF), and Extreme Gradient Boosting (XGBoost) were employed to simulate flood inundation at a fine scale of 30m in the severely flood-prone Mahanadi basin. An ensemble of linear model, random forest, and support vector machine models were further tested for geographical extrapolation which quantified the flood hazard in an ungauged basin, which was lagged by tree-based models. These ML models were trained using a flood inundation map derived from LISFLOOD-FP using the ERA5 reanalysis dataset. The performance of the GFD-derived flood map is tested against the LISFLOOD-FP flood map through a set of performance statistics. The performance of the model developed was evaluated using Area Under the receiver operating characteristics curve (AUC), kappa coefficient, precision, recall, and F1 score, while RMSE and KGE were used for regression models. The ambiguous nature of ML models was also estimated using SHAP values to justify the degree of influence of each GFD on flood depth. The ongoing research also inspires to the development of a global flood inundation atlas using RCMs, which can be used to compare and validate inundation over large regions through geomorphic analysis. Any uncertainty in flood inundation estimates may amplify largely while quantifying flood risk, including vulnerability and exposure dimensions.

Keywords: Flood hazard, Geomorphic Flood Descriptors, LISFLOOD-FP, Machine Learning, SHAP

How to cite: Mohanty, M. and Tripathi, V.: Can Catchment attributes coupled with an Ensemble of Machine Learning improve Flood Hazard mapping over large data-scarce catchments?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9063, https://doi.org/10.5194/egusphere-egu24-9063, 2024.

EGU24-9683 | ECS | Posters virtual | NH6.7

Integrating XBoost and SHAP for Enhanced Interpretability in Landslide Susceptibility Assessment: A Case Study in North-western Peloponnese, Greece. 

Maria Sotiria Frousiou, Ioanna Ilia, Dimitrios Kasmas, and Ioanna Petropoulou

Landslide phenomena, acknowledged as significant geohazards affecting both human infrastructure and the natural environment, have been the
subject of intensive research aimed at pinpointing areas at risk of instability. This task involves the complex modelling of variables related to landslides, which requires both knowledge-based and data-driven methodologies. The challenge is heightened by the often intricate and obscure processes that trigger landslides be they natural or anthropogenic. Over the past two decades, the application of artificial intelligence, specifically machine learning algorithms, has brought a transformative approach to landslide susceptibility evaluations. These advanced methodologies, encompassing fuzzy logic, decision trees, artificial neural networks, ensemble methods, and evolutionary algorithms, have demonstrated notable accuracy and dependability. A significant recent development in this field is the incorporation of eXplainable AI (XAI) techniques into landslide susceptibility models. XAI tools, such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), offer a window into the previously opaque decision-making processes of AI models, thus demystifying the "black box" aspect of conventional AI systems.

The primary aim of this study was to employ the XBoost algorithm and integrate SHAP methods for an in-depth landslide susceptibility assessment. The methodology was methodically divided into five distinct phases: (i)the creation of the inventory map, (ii)the selection, classification, and weighting of landslide-influencing variables, (iii)conducting multicollinearity analysis, (iv)applying and testing the developed model, and (v)evaluating the predictive performance of various models and analyzing the results.

The computational work was performed using coding languages R and Python, while ArcGIS 10.5 was instrumental in compiling data and producing detailed landslide susceptibility maps. This study's efficiency was tested in the North-western Peloponnese region of Greece, known for its frequent landslide occurrences. Nine specific variables were considered: elevation, slope angle, aspect, plan and profile curvature, distance to faults, distance to river networks, lithology and hydrolithology cover and landslide locations, all contributing to the generation of training and test datasets. The Frequency Ratio method was applied to discern the correlation among these variables and assign weight values to each class. Multi-collinearity analysis further helped in identifying any collinearity among the variables.

SHAP values were utilized to rank features according to their importance, offering a transparent view of variable contributions. The evaluation phase involved calculating the model's predictive power using metrics like classification accuracy, sensitivity, specificity, and the area under the success and predictive rate curves (AUC). This comprehensive approach combining XBoost and SHAP methods presents a refined model for understanding and predicting landslide susceptibility, aiming for more accurate and interpretable hazard assessments. The results highlight the high performance of the XBoost algorithm, in terms of accuracy, sensitivity, specificity and AUC values. SHAP method indicates that slope angle was the most important feature in this model for landslide susceptibility. Other features such as elevation, distance to river network, and lithology cover also contribute to the model's predictions, though to a lesser extent and with more mixed effects. Aspect, profile curvature, plan curvature, distance to fault, and hydrolithology cover appear to have a more moderate or minimal impact on the model’s predictions. 

How to cite: Frousiou, M. S., Ilia, I., Kasmas, D., and Petropoulou, I.: Integrating XBoost and SHAP for Enhanced Interpretability in Landslide Susceptibility Assessment: A Case Study in North-western Peloponnese, Greece., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9683, https://doi.org/10.5194/egusphere-egu24-9683, 2024.

EGU24-11739 | ECS | Posters virtual | NH6.7

Leveraging Near Real-Time Remote Sensing and Explainable AI for Rapid Landslide Detection: A Case Study in Greece 

Aikaterini-Alexandra Chrysafi, Paraskevas Tsangaratos, and Ioanna Ilia

Landslides, triggered by severe rainfall events, pose significant risks to both life and infrastructure. Timely and accurate detection of such landslides is crucial for effective disaster management and mitigation. This study presents an innovative approach combining near real-time remote sensing data with advanced machine learning techniques to rapidly identify landslide occurrences following severe rainfall events, specifically focusing on a recent case in Greece.
Our methodology harnesses the capabilities of pre and post-event satellite imagery to capture the landscape's transformation due to landslides. We compute remote sensing indices, including the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI), among others, to detect changes indicative of potential landslide areas. This approach leverages the temporal resolution and wide-area coverage of satellite data, enabling a swift and comprehensive assessment immediately after a triggering rainfall event.
To enhance the accuracy of our detection model and reduce false positives, we incorporate a landslide susceptibility map generated via a Weight of Evidence (WoE) model. This map is based on historical landslide occurrences and helps to exclude areas with very low to low susceptibility, thereby refining our detection process.
Central to our study is the implementation of an eXplainable AI (XAI) framework. This aspect is particularly crucial, as it provides insights into the influence of various landslide-related factors on the model's predictions. The factors considered include elevation, slope angle, aspect, plan and profile curvature, distance to faults and river networks, lithology, and hydrolithology cover. By employing XAI techniques, we unravel the complex interactions between these variables and their relative importance in predicting landslide occurrences. This not only enhances the trustworthiness and transparency of our model but also aids in understanding the underlying geophysical processes leading to landslides.
The model's architecture is built upon advanced machine learning algorithms capable of processing large datasets efficiently. This setup is particularly suited to handle the high-dimensional and multi-temporal nature of remote sensing data. Furthermore, the model's ability to rapidly process and analyze data aligns well with the urgency required in disaster response scenarios.
Our case study in Greece demonstrates the model's efficacy in rapidly identifying landslide-prone areas post-severe rainfall events. The results show a significant improvement over traditional methods in terms of speed and accuracy. Moreover, the inclusion of XAI provides valuable insights for local authorities and disaster management teams, enabling them to make informed decisions for emergency response and long-term land-use planning.
This research contributes to the evolving field of rapid landslide detection by integrating cutting-edge remote sensing technologies with the latest advancements in machine learning and AI interpretability. It offers a novel, efficient, and transparent approach to landslide detection, which is vital for enhancing disaster preparedness and resilience in landslide-prone regions.

How to cite: Chrysafi, A.-A., Tsangaratos, P., and Ilia, I.: Leveraging Near Real-Time Remote Sensing and Explainable AI for Rapid Landslide Detection: A Case Study in Greece, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11739, https://doi.org/10.5194/egusphere-egu24-11739, 2024.

EGU24-12368 | Orals | NH6.7

The deployment of an integrated suite for wildfire prediction, near real-time fire monitoring and post-event mapping over Attica region, Greece. 

Nikolaos S. Bartsotas, Stella Girtsou, Alexis Apostolakis, Themistocles Herekakis, and Charalampos Kontoes

In a changing climate, the growing frequency and intensity of wildfires requires innovative services in order to efficiently remediate against their catastrophic socioeconomic threat. Under the framework of MedEWSa project, we capitalise upon the reliability of the FireHub platform to further enhance its capability and features along the full spectrum of pre-event to post-event time scales, catering: (i) prevention and preparedness, (ii) detection and response, as well as (iii) restoration and inducement of cascading effects.

During the pre-event stage, the fire risk over Attica Region is denoted on a daily basis in 5 risk levels over a detailed 500m grid spacing through a combination of high resolution numerical weather predictions, advanced ML models that utilize historic wildfire record analysis as well as a number of associated atmospheric parameters (temperature, wind speed and direction, precipitation, dew point) and datasets (DEM, land use / land cover) from 2010 onwards. During the event, continuous monitoring is provided through MSG/SEVIRI image acquisitions every 5 minutes from NOA’s in-house antenna, while the spatiotemporal fire-spread information is simulated through a dynamic modelling of the evolving fire. This feature is currently being further developed in order to be capable of performing “hot” starts along the incident and re-estimate based upon new hotspot retrievals from VIIRS imagery. Finally, the procedure of post-event burnt-scar mapping is currently being automated, to provide rapid footprints of the affected areas by utilising MODIS, VIIRS and Sentinel imagery and examine potential cascading effects through hazard assessment maps on landslides, soil erosion and floods. The whole suite will be hosted on a brand new fully responsive user interface that will provide detailed yet straightforward and easy to adopt information in order to enhance the decision making of policy makers and public bodies.

How to cite: Bartsotas, N. S., Girtsou, S., Apostolakis, A., Herekakis, T., and Kontoes, C.: The deployment of an integrated suite for wildfire prediction, near real-time fire monitoring and post-event mapping over Attica region, Greece., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12368, https://doi.org/10.5194/egusphere-egu24-12368, 2024.

EGU24-15738 | ECS | Orals | NH6.7 | Highlight

Digital Twins for Early Warning Systems: Intricacies and Solutions 

Saman Ghaffarian, Fakhereh Alidoost, Umut Lagap, Pranav Chandramouli, Yifat Dzigan, Meiert Grootes, Fatemeh Jalayer, and Ilan Kelman

In the ubiquitous dynamic landscape of social changes and technological advancements, the utilization of innovative solutions for disaster early warning systems (and for other forms of warning) has become paramount. This study explores the incorporation of Digital Twins (DT), dynamic digital replicas of physical entities, into disaster warning. Drawing from insights obtained through a comprehensive literature review and perspectives gleaned from a workshop, we investigate the technical challenges and needs of the research communities engaged in developing DTs for disaster risk management. Additionally, we propose a novel framework for employing DTs in early (and beyond) warning systems.

The implementation of DTs for early warning involves several intricacies and challenges. For instance, achieving seamless data fusion is crucial for enhancing the accuracy and timeliness of early warnings.  However, the real-time integration of diverse and large data sources, including geospatial data, environmental sensors, social media feeds, and demographic and census data is not straightforward task. Another intricacy involves the need for robust predictive modelling within the DT framework. Overcoming this challenge requires the development of dynamic models that can adapt to evolving disaster scenarios. Machine Learning plays a pivotal role in this context, enabling the DT to continuously learn and improve its predictive capabilities. Privacy concerns and ethical considerations are paramount in the use of DTs for early warning, especially when leveraging data from various sources and to ensure trust and credibility. Solutions include the development of privacy-preserving methods and transparent communication strategies to gain public trust and ensure responsible model development and data usage. Furthermore, user interaction and community involvement are essential aspects of a successful DT-based early warning system. Tailoring communication strategies to diverse audiences and fostering community engagement through user-friendly interfaces contribute to the effectiveness of early warnings.

Accordingly, we propose solutions and strategies for addressing these challenges. For instance, leveraging edge computing capabilities for real-time data processing, integrating explainable artificial intelligence (AI) techniques to enhance model interpretability and transparency, and adopting decentralized data governance frameworks like Blockchain address key challenges in DT implementation for early warning systems.

This study provides valuable insights into the current state of DT integration for disaster early warning, highlighting intricacies and offering examples of solutions. By understanding the challenges and proposing a new integration framework, we pave the way for the realization of the full potential of Digital Twins in advancing disaster resilience, early warning capabilities, and contributing to the United Nations’ initiative ‘Early Warnings for All’.

How to cite: Ghaffarian, S., Alidoost, F., Lagap, U., Chandramouli, P., Dzigan, Y., Grootes, M., Jalayer, F., and Kelman, I.: Digital Twins for Early Warning Systems: Intricacies and Solutions, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15738, https://doi.org/10.5194/egusphere-egu24-15738, 2024.

EGU24-16038 | ECS | Posters virtual | NH6.7

Automatic regional identification of active and inactive landslides using satellite image analysis 

Ploutarchos Tzampoglou, Dimitrios Loukidis, Paraskevas Tsangaratos, Aristodemos Anastasiades, Elena Valari, and Konstantinos Karalis

Over the past decades, landslides have significantly affected extensive areas worldwide due to changing environmental conditions and human activities, causing major problems in the built environment and infrastructure and resulting in the loss of human lives and significant financial damages. The island of Cyprus and especially its southwestern part (which constitutes the study area) have experienced the severe impact of landslides due to the unfavorable geological/geotechnical conditions and mountainous geomorphology. According to the data obtained from the Geological Survey Department of Cyprus (GSD), 1842 landslides (active and inactive) of various types have been identified in an area covering 40% (546km2) of the Paphos District (3.4 landslides per km2).

Knowledge of the location and extent of existing landslides plays crucial role in the landslide susceptibility and hazard assessment. The primary aim of this research is to develop an algorithm for the automatic detection of landslides at regional scale. This is achieved through application of image recognition technology utilizing the cascade method on the hillshade of a region as produced by ArcGIS. The database of recorded landslides of the GSD was split in a algorithm training dataset and a validation dataset. The study also explores the effect of the resolution of terrain data, expressed by the size of the grid cells. To comprehensively assess landslides, the morphology is classified into three types: active, dormant, and relict. The use of hillshade instead of a raster image of the elevation map was chosen because the latter usually results in relatively minor color variations between adjacent pixels, thus hindering the most striking geomorphological features of landslides, which are the main scarp and the enveloping streams.

The results obtained suggest that a hillshade produced using a high-resolution Digital Elevation Model (DEM), i.e. based on elevation contour interval of 1m and a cell size 1 x 1 m (obtained from the Department of Land and Surveys of the Republic of Cyprus), yields better results for landslides with gentle geomorphology (relict). Nonetheless, analysis based on such a high-resolution DEM requires substantial computational resources and time. On the contrary, landslides associated with steeper geomorphologies (active) exhibited optimal performance with a cell size of 2 x 2 m, achieving success rates (80%), for DEMs based on contour intervals of 1m and 5m. In this case, the computational time is significantly reduced.  Depending on the specific landslide types investigated in a particular area, the appropriate processing model can be selected, ultimately leading to significant time savings.

This research was funded by the European Commission (Marie Sklodowska-Curie Actions, Hybland-Society and Enterprise panel, Project No.: 101027880).

How to cite: Tzampoglou, P., Loukidis, D., Tsangaratos, P., Anastasiades, A., Valari, E., and Karalis, K.: Automatic regional identification of active and inactive landslides using satellite image analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16038, https://doi.org/10.5194/egusphere-egu24-16038, 2024.

With the increasing frequency of global extreme weather events and urbanization accelerates, the probability of urban flooding has significantly increased, posing a grave threat to both property and lives. Creating accurate flood maps is a critical component of effective emergency management for urban floods. However, current research primarily focuses on the extent of urban flood, with little consideration given to its type. Different types of floods often have varying water components and sediments, necessitating the identification of flood types during mapping to provide targeted relief. This paper proposes a method using multiple Convolutional Neural Networks (CNNs) that combines U-Net and ResNet architectures for urban flood extent extraction and type classification. The proposed method achieved 97.1% accuracy in flood extent extraction and 91% accuracy in flood type classification, demonstrating its accuracy in urban flood mapping. Furthermore, the research was validated using a global dataset, covering six continents and 20 countries, encompassing samples with diverse dimensions and geographical features, showcasing the robustness and practicality of the model in various regions.

Keywords: Urban flood mapping, Flood type, Deep learning, CNN, Classification

How to cite: Wang, Z. and Zhang, C.: Urban Flood Extent Extraction and Type Recognition Based on Multiple Convolutional Neural Networks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16528, https://doi.org/10.5194/egusphere-egu24-16528, 2024.

The presented study implements existing deep learning (DL) algorithms, an artificial intelligence approach, to extract geotechnical properties about unconsolidated material from photographs. The ultimate goal of this approach lies in facilitating, aiding and simplifying the collection of often missing data about unconsolidated bedrock cover relevant in regional landslide susceptibility studies.  Current research aims at answering, if existing DL algorithms (e. g. Buscombe’s (2020) Sedinet algorithm), developed for granular, often well-sorted sediments, can also perform well with poorly-sorted sediments. It also inquires, if, which and how well geotechnical properties, as described in soil classification standards like ISO 14688-1:2017-12 (EU) and ASTM D2487-17e1 (USA), can be directly or indirectly obtained through DL analysis of photographs. The study approaches these questions by initially building a DL model based on several thousand photographs of 240 samples of unconsolidated material plus their several hundred laboratory sieve residue samples. In a previous project, the 240 samples of mostly alluvial, colluvial, eolian and glacial sediments had been collected from different geological environments within the state of Styria, Austria. Grain size distribution (GSD) and other soil classification parameters, obtained through field and laboratory testing, exist for these samples and have been provided as courtesy by the Land Steiermark (State of Styria).  In the current study this knowledge about geotechnical properties of the samples allows attribution of this information to each of the several thousand photographs, which were taken with three different cameras under controlled conditions. The DL model uses several hundred of these photographs with their associated attributes as training and test data to build a prediction model. The validation of thus derived model in regard to its performance is achieved with selected photographs, not yet used in the training and testing. Results of this approach allow a discussion about applicability, emerging limitations and possible improvements in regard to predicting geotechnical parameters, particularly GSD, for unconsolidated material using existing DL algorithms. As a consequence the results and drawn conclusions also warrant an outlook and contemplation on how, if and in what way the method can aid and simplify field mapping and the collection of relevant input data for regional landslide susceptibility studies.  

How to cite: Kurka, M.: Performance of deep learning algorithms on obtaining geotechnical properties of unconsolidated material to improve input data for regional landslide susceptibility studies, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16620, https://doi.org/10.5194/egusphere-egu24-16620, 2024.

EGU24-17169 | Posters virtual | NH6.7

Utilizing Geographic Information Systems to Identify and Map Climate Hazards in Greece: A Regional Analysis 

Kleoniki Valvi, Constantinos Cartalis, Kostas Philippopoulos, Athina-Kyriaki Zazani, and Ilias Agathangelidis

The aim of the present study is the identification of the prevailing climate hazards (e.g., extreme heat, forest fires, drought, floods) and their changes, in terms of frequency, intensity, and trends during multiple 30-year climate reference periods in Greece. The analysis involves the identification of climate hazards using a plethora of extreme event indices along with the application of the extreme value theory (EVT). Changes in extremes over a period are often examined under two different perspectives, one that detects changes in the frequency of the extremes and the other in their intensity. For this purpose, high-resolution reanalyses data (ERA5-Land) are used, with a horizontal resolution of 0.1o x 0.1o. The sensitivity of diverse regions was determined through the analysis of Earth Observation data and products, alongside with the examination of their geomorphological features. In the final stage of the work, all of the above were incorporated using Geographic Information Systems, and GIS tools were developed for the synthesis of the climate hazards. This analysis focuses on the understanding of how climate change may be impacting Greece and can provide valuable insights for policymakers, researchers, and the general public to adapt to and mitigate the effects of climate hazards on a regional scale.

How to cite: Valvi, K., Cartalis, C., Philippopoulos, K., Zazani, A.-K., and Agathangelidis, I.: Utilizing Geographic Information Systems to Identify and Map Climate Hazards in Greece: A Regional Analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17169, https://doi.org/10.5194/egusphere-egu24-17169, 2024.

EGU24-18310 | ECS | Posters on site | NH6.7 | Highlight

ClarifAI: Interactive XAI Methods for Geosciences 

Yulia Grushetskaya, Mike Sips, Reyko Schachtschneider, and Mohammadmehdi Saberioon

In geosciences, machine learning (ML) has become essential for solving complex problems, such as predicting natural disasters or analysing the impact of extreme temperatures on mortality rates. However, the integration of ML into geoscience scenarios faces significant challenges, especially in explaining the influence of hyperparameters (HP) on model performance and model behaviour in specific scenarios. The Explainable Artificial Intelligence (XAI) system ClarifAI developed at GFZ addresses these challenges by combining XAI concepts with interactive visualisation. 

ClarifAI currently provides users with two interactive XAI methods: HyperParameter Explorer (HPExplorer) and Hypothetical Scenario Explorer (HSExplorer). 

HPExplorer allows interactive exploration of the HP space by computing an interactive tour through stable regions of the HP space. We define a stable region in HP space as a subspace of HP space in which ML models show similar model performance. We also employ HP importance analysis to deepen the understanding of the impact of separate HPs on model performance.The Hypothetical Scenarios Explorer (HSExplorer) helps users explore model behaviour by allowing them to test how changes in input data affect the model's response. 

In our presentation, we will demonstrate how HSExplorer helps users understand the impact of individual HPs on model performance. As ClarifAI is an important research area in our lab, we are interested in discussing relevant XAI challenges with the XAI community in ESSI.

 Our goal is to create a comprehensive set of tools that explain the mechanics of ML models and allow practitioners to apply ML to a wide range of geoscience applications.

How to cite: Grushetskaya, Y., Sips, M., Schachtschneider, R., and Saberioon, M.: ClarifAI: Interactive XAI Methods for Geosciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18310, https://doi.org/10.5194/egusphere-egu24-18310, 2024.

EGU24-18870 | Posters virtual | NH6.7

Spatial correlation between landslides and geotechnical factors using Random Forest and SHAP 

Ploutarchos Tzampoglou, Dimitrios Loukidis, Konstantinos karalis, Aristodemos Anastasiades, and Paraskevas Tsangaratos

The activation as well as the consequences of landslides are difficult to predict, as they depend on factors characterized by large variability and uncertainties. The aim of this study is to establish a correlation between geological, geotechnical and geomorgpohlogical characteristics and the spatial distribution of recorded landslides.

The study area is located in the southwestern (SW) part of the island of Cyprus, covering an area of 552km2. During the past years, more than 1800 landslides, active and inactive (dormant and relict), have been recorded within this area through detailed mapping based on field observations, rendering the area an ideal test bed. At the beginning of this research study, all recorded landslides were digitized in raster format. Consequently, the study area was partitioned into 15 x 15m size cells having three classes: no landslides, inactive landslides and active landslides. Additionally, regarding the geological aspect, polygons encompassing 100% rock mass formations within recorded landslides were categorized as rock mass landslides, while the rest were characterized as landslides in argillaceous (soft rock and soil) materials. A series of correlation analyses were conducted using the Random Forest and SHAP (Shapley Additive explanation) methods.

Considering the outcomes of the Random Forest method in argillaceous materials, it turns out that the most important factors for both active and inactive landslides are the Plasticity Index (PI) and the clay fraction (CF), followed by the factors associated with the geomorphology and the bedding structure (e.g. slope angle and bedding dip). The ranking results for inactive and active landslides in rock mass show that the most important factor is the Uniaxial Compressive Strength (UCS), followed by the Geological Strength Index (GSI). Furthermore, the orientation (azimuth) difference between slope and bedding dip (dip direction difference) appears to be more important than the slope angle.

Similar ranking results were obtained using the SHAP method for argillaceous materials. Regarding the contribution of each factor in the inactive landslides, it appears that the PI and the slope angle increase proportionally to the possibility of landslide occurrence, while the CF does not exhibit a specific trend. Regarding the dip direction difference, small values contribute more to the occurrence of landslides. The active landslides show a similar picture, but with the CF exhibiting a stronger correlation than in the case of inactive landslides. According to the SHAP analysis for rock mass, the parameters of importance in both inactive and active landslides are UCS and GSI, followed by the slope angle and the dip direction difference.

This research was funded by the European Commission (Marie Sklodowska-Curie Actions, Hybland-Society and Enterprise panel, Project No.: 101027880).

How to cite: Tzampoglou, P., Loukidis, D., karalis, K., Anastasiades, A., and Tsangaratos, P.: Spatial correlation between landslides and geotechnical factors using Random Forest and SHAP, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18870, https://doi.org/10.5194/egusphere-egu24-18870, 2024.

EGU24-20051 | Posters virtual | NH6.7

Resilient Data Harvesting: A Low-Cost IoT Paradigm for Robust Measurement Collection in Challenging Environments 

Marios Vlachos, Chrysoula Papathanasiou, Valantis Tsiakos, Georgios Tsimiklis, and Angelos Amditis

Desert ecosystems are particularly vulnerable to global climate change, characterized by increased temperatures, variable intensity and frequency in precipitation and increased atmospheric CO2 levels. Under such conditions, substantial alterations in their structure and functioning of desert ecosystems is expected. This climate shift poses a serious threat to species adapted to deserts, especially endemic plants, which are susceptible to the potential loss of suitable habitats. Further to that, neighboring populated areas are also exposed to adverse conditions characterized by poor air quality, with direct impacts on human health, the economy and the environment overall. To address these challenges, the CiROCCO Project aims to implement a robust yet cost-effective Internet of Things (IoT) system for environmental measurements in harsh desert environments. Such a system not only enhances data accuracy but also enables continuous monitoring, reduces costs, and supports critical research and conservation efforts considering climate change and ecosystem challenges. The proposed IoT system primarily relies on a network of distributed low-cost Wireless Sensor Nodes (WSNs) that have the capability to monitor the surrounding environment and measure various crucial meteorological and air quality parameters, including inter alia air and sand/soil temperature, solar radiation, Ozone, PM2.5, PM10, with accuracy comparable to commercial high-end nodes offering similar measurements. Additionally, communication gateways are employed to collect measurements from the distributed WSNs using low-power consumption protocols such as Bluetooth Low Energy (BLE) and LoRaWAN. The collected measurements are then standardized into JSON messages, including the unique identifier of the device, timestamp, and parameter values. Subsequently, the data are transmitted wirelessly to the cloud using the most suitable method based on network connectivity. If there is an available Wi-Fi network in the field, the data is prioritized for transmission through this network. Alternatively, the system utilizes the 4G or 5G network in the area. In cases where none of these networks is accessible, the data is transmitted to the cloud through satellite communications. This method involves an additional satellite device connected to the gateway, where the formatted messages are loaded through serial communications. The satellite device awaits the next pass of the nanosatellite, for uploading the measurements. The nanosatellite continues its journey until it passes by a base station, at which point the data are downloaded, stored in the base station portal, and made available to third-party applications through the portal API. In conclusion, the scientific approach outlined in this work addresses the imposing challenges of collecting valuable in-situ data for monitoring climatic conditions in hard-to-reach under-sampled environments. The development of low-cost devices, including WSNs and gateways with IoT capabilities, is crucial for advancing research and conservation efforts in the context of climate change and considering the unique challenges posed on desert ecosystems.

AKNOWLEDGMENTS

This research work is part of the CiROCCO Project. CiROCCO Project is funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or REA. Neither the European Union nor the granting authority can be held responsible for them.

How to cite: Vlachos, M., Papathanasiou, C., Tsiakos, V., Tsimiklis, G., and Amditis, A.: Resilient Data Harvesting: A Low-Cost IoT Paradigm for Robust Measurement Collection in Challenging Environments, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20051, https://doi.org/10.5194/egusphere-egu24-20051, 2024.

EGU24-20549 * | Posters on site | NH6.7 | Highlight

The Use of Digital Twins for the Management of Cultural Heritage Sites 

Kyriacos Themistocleous and Dante Abate

There is a need for the use of digital twins of cultural heritage sites, especially for those that are affected by natural hazards, for documentation, monitoring and management. This study examines the use of digital twins through the EXCELSIOR and TRIQUETRA project for the use of 3D digital volumentric reporesentation model and Augmented Reality applications by creating a digital twin for monitoring natural hazards in archaeological settings. The EXCELSIOR H2020 Widespread Teaming project under Grant Agreement No 857510 and the TRIQUETRA project Horizon Europe, Grant Agreement No. 101094818 will study the effects of climate change and natural hazards on cultural heritage and remediation using state-of-the-art techniques.  Through the TRIQUETRA project, Choirokoitia, Cyprus is used as one of the pilot studies using these techniques. Choirokoitia is a UNESCO World Heritage Site and is one of the best-preserved Neolithic sites in the Mediterranean. The project will also examine the potential risk of rockfall at the Choirokoitia site, as the topology of the site is vulnerable to movements as a result of extreme climate change as well as of daily/seasonal stressing actions. Rockfall poses a significant danger to visitor safety as well as damage to cultural heritage sites.

Digital twins provide a dynamic visualization of the site and can also be used to monitor any changes resulting from natural hazards. A digital twin model can also be shared with visitors in order to provide an alternative approach and a visualization experience for viewing the site.

How to cite: Themistocleous, K. and Abate, D.: The Use of Digital Twins for the Management of Cultural Heritage Sites, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20549, https://doi.org/10.5194/egusphere-egu24-20549, 2024.

EGU24-22176 | Orals | NH6.7

Machine Learning Approach for Next-Day Wildfire Prediction: Challenges, Solutions, andInsights 

Stella Girtsou, Alexis Apostolakis, Konstantinos Alexis, Mariza Kaskara, Giorgos Giannopoulos, and Charalampos Kontoes

Next-day wildfire prediction is a critical research problem with significant implications for the environment, society, and economy. This study addresses the challenges associated with accurately predicting fire occurrences and presents a machine learning methodology designed to achieve high sensitivity and specificity in predicting wildfires at a country-wide scale with high spatial granularity. The unique aspects of the problem, including extreme data imbalance, massive scale, heterogeneity, and absence of fire, are thoroughly examined.

The proposed methodology focuses on three key components:

  • Feature Set Enhancement: An extended set of fire driving factors, encompassing topography, meteorology, Earth Observation data, and historical fire occurrence information, is utilized. This comprehensive feature set provides a holistic view of the factors influencing fire risk.
  • State-of-the-Art Classification Algorithms: A set of well-established classification algorithms, including Random Forest, Extremely Randomized Trees, XGBoost, and shallow Neural Networks, for benchmarking is employed. These algorithms are carefully tuned and optimized to strike a balance between sensitivity and specificity. Furthermore, state-of-the-art Deep Learning Methodologies like Semantic Segmentation and Metric Learning are employed and tuned for this specific task.
  • Effective Cross-Validation and Model Selection: Two alternative cross-validation schemes and custom validation measures are introduced to ensure optimal training of classification models. This allows for the selection of diverse models based on the desired trade-off between sensitivity and specificity.

The paper addresses specific challenges, such as extreme data imbalance, massive scale of data, heterogeneity, and absence of fire. The scale of the dataset, with over 830 million instances covering a 500m grid cell resolution for the entire Greek territory, necessitates careful undersampling for model training. Heterogeneity and concept drifts in different months are acknowledged, and the absence of fire instances is discussed in the context of unpredictable factors.

The study explores pitfalls, best practices, and directions for further investigation, providing valuable insights into the complexities of next-day wildfire prediction. The impact of class_weights hyperparameter in compensating for data imbalance is highlighted, emphasizing its significance in cost-sensitive learning.

In conclusion, the proposed machine learning methodology demonstrates effectiveness and efficiency in next-day fire prediction, aligning with real-world fire prediction system requirements. Further, our proposed methods achieve adequately high effectiveness scores (sensitivity > 90%, specificity > 80%) and are realized within a pre-operational environment that is continuously assessed on real-world conditions and also improved based on the feedback of the Greek Fire Service.  The study contributes insights that can guide future research in addressing the challenges associated with wildfire prediction, paving the way for more accurate and reliable models in the field.

Acknowledgement: "This work has been supported by the national research project PREFERRED, which is co-funded by Greece and the European Union through the Regional Operational Programme of Attiki, under the call "Research and Innovation Synergies in the Region of Attica” (Project code: ΑΤΤΡ4-0340489)"

How to cite: Girtsou, S., Apostolakis, A., Alexis, K., Kaskara, M., Giannopoulos, G., and Kontoes, C.: Machine Learning Approach for Next-Day Wildfire Prediction: Challenges, Solutions, andInsights, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22176, https://doi.org/10.5194/egusphere-egu24-22176, 2024.

The exponential increase in flood intensity that causes loss of life and economic and structural
damage to their connected environment calls for strategic rescue and response solutions for
risk mitigation. This study focuses on flood mapping using satellite imagery combined with
machine learning (ML) and deep learning (DL) techniques. Remote sensing and Geographic
Information Systems (GIS) serve as vital tools in this process, enabling the effective utilization
of satellite data.
While academics consistently contribute to novel flood mapping approaches, a research gap
that requires a discussion about the comparative performances of these ML and DL
techniques exists, which this paper aims to address. This comparison is crucial as it highlights
the strengths and limitations of each method, contributing valuable insights to the literature on
flood risk management. The study focuses on the Ernakulam District of Kerala, chosen due to
its frequent flooding and the availability of diverse datasets.
The methodology involves the use of satellite imagery for flood analysis, employing an array
of techniques: a thresholding method recommended by the UN-SPIDER Office for Outer
Space Affairs, and statistical ML methods including Random Forest, Support Vector
Classification (SVC), Real AdaBoost, alongside a deep learning semantic segmentation
method, UNet. Modelled using JavaScript and Python languages, the models and the
packages are completely reusable. The dataset comprises two before and after floods satellite
images: the thresholding method uses Sentinel-1 SAR images, and the ML and DL method
uses Sentinel-2 MSI Level 1C, a digital elevation model image from SRTM for feature
engineering, processed to identify flood-affected areas. The data is normalized and cleaned
to account for cloud and missing data before the analysis. Alongside, we sourced the labelled
flood data from the Kerala State Disaster Management Authority (KSDMA) and filtered and
rasterized it on QGIS.
The results emphasize the varied effectiveness of these methods, with Random Forest
outperforming others with a 96.61% accuracy rate. At the same time, the UNet-Linear Model
lags at 75% accuracy, indicating the significant impact of hyperparameter tuning and dataset
size on model performance. This comparative analysis not only delineates the strengths and
weaknesses of traditional and advanced techniques but also sets a precedent for future
studies to build upon an understanding of flood risk management and rapid response
strategies.

How to cite: Menon, N., Parastoo, S., and Adriaensen, R.: Flood Inundation Mapping of the 2018 Kerala Floods: A ComparativeStudy of Traditional Remote Sensing, Machine Learning, and Deep Learning Methods. , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22400, https://doi.org/10.5194/egusphere-egu24-22400, 2024.

Detailed information on the exposure of critical infrastructure (CI), such as power assets, is a necessity to establish accurate risk assessment from natural and human-made hazards. Currently, large-scale risk assessment mostly relies on Volunteered Geographic Information to establish the exposure of CI causing limited reliability due to inherent information gaps. Deep Learning offers the possibility to fill such gaps through the extraction of CI from remote sensing imagery.

Here we present a comprehensive high-resolution geospatial database encompassing key elements of the power grid, namely power towers, electrical substations, and power plants. The dataset is derived from a workflow using Worldview-2 0.4-meter resolution satellite imagery for the most populated urban areas along the European coastlines.

The method extracts infrastructure location from OpenStreetMap to create annotations. Subsequently, the satellite imagery raster and annotations undergo processing to constitute training data. Data augmentation is employed on the raster tiles to enhance the training dataset. The method then trains a Mask R-CNN model to automate the detection of CI. Additionally, saliency maps are generated to validate the proper functioning of the model.

Performance metrics, specifically mean Average Precision and F-scores of the tile classification, are presented to evaluate the model's ability to correctly identify and classify power infrastructure. Furthermore, to assess the completeness of the geospatial database, a comparative analysis is conducted with OpenStreetMap on “unseen” locations. This comparative study sheds light on potential gaps and discrepancies, offering insights into the overall reliability and comprehensiveness of the dataset.

How to cite: De Plaen, J., Koks, E., and Ward, P.: A Coastal European Dataset of Critical Infrastructure:  Leveraging Deep Learning to Enhance Power Infrastructure Exposure Information for Disaster Risk Assessment., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22456, https://doi.org/10.5194/egusphere-egu24-22456, 2024.

EGU24-24 | Orals | NP4.1

The fractional Sinusoidal wavefront Model (fSwp) for time series displaying persistent stationary cycles 

Gael Kermarrec, Federico Maddanu, Anna Klos, and Tommaso Proietti

In the analysis of sub-annual climatological or geodetic time series such as tide gauges, precipitable water vapor, or GNSS vertical displacements time series but also temperatures or gases concentrations, seasonal cycles are often found to have a time-varying amplitude and phase.

These time series are usually modelled with a deterministic approach that includes trend, annual, and semi-annual periodic components having constant amplitude and phase-lag. This approach can potentially lead to inadequate interpretations, such as an overestimation of Global Navigation Satellite System (GNSS) station velocity, up to masking important geophysical phenomena that are related to the amplitude variability and are important for deriving trustworthy interpretation for climate change assessment.

We address that challenge by proposing a novel linear additive model called the fractional Sinusoidal Waveform process (fSWp), accounting for possible nonstationary cyclical long memory, a stochastic trend that can evolve over time and an additional serially correlated noise capturing the short-term variability. The model has a state space representation and makes use of the Kalman filter (KF). Suitable enhancements of the basic methodology enable handling data gaps, outliers, and offsets. We demonstrate our method using various climatological and geodetic time series to illustrate its potential to capture the time-varying stochastic seasonal signals.

How to cite: Kermarrec, G., Maddanu, F., Klos, A., and Proietti, T.: The fractional Sinusoidal wavefront Model (fSwp) for time series displaying persistent stationary cycles, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-24, https://doi.org/10.5194/egusphere-egu24-24, 2024.

On some maps of the first military survey of the Habsburg Empire, the upper direction of the sections does not face the cartographic north, but makes an angle of about 15° with it. This may be due to the fact that the sections were subsequently rotated to the magnetic north of the time. Basically, neither their projection nor their projection origin is known yet.

In my research, I am dealing with maps of Inner Austria, the Principality of Transylvania and Galicia (nowadays Poland and Ukraine), and I am trying to determine their projection origin. For this purpose, it is assumed, based on the archival documentation of the survey, that these are Cassini projection maps. My hypothesis is that they are Graz, Cluj Napoca or Alba Julia and Lviv. I also consider the position of Vienna in each case, since it was the main centre of the survey.

The angle of rotation was taken in part from the gufm1 historical magnetic model back to 1590 for the assumed starting points and year of mapping. In addition, as a theoretical case, I calculated the rotation angle of the map sections using coordinate geometry. I then calculated the longitude of the projection starting point for each case using univariate minimization. Since the method is invariant to latitude, it can only be determined from archival data.

Based on these, the starting point for Inner Austria from the rotation of the map was Vienna, which is not excluded by the archival sources, and since the baseline through Graz also started from there, it is partly logical. The map rotation for Galicia and Transylvania also confirmed the starting point of the hypothesis.  Since both Alba Julia and Cluj Napoca lie at about the same longitude, the method cannot make a difference there; and the archival data did not provide enough evidence. In comparison, the magnetic declination rotations yielded differences of about 1°, which may be due to an error in the magnetic model.

On this basis, I have given the assumed projections of the three maps with projection starting points, and developed a method for determining the projection starting points of the other rotated grid maps. The results suggest that there is a very high probability that the section network was rotated in the magnetic north direction, and thus provide a way to refine the magnetic declination data at that time.

With this method I managed to give new indirekt magnetic declinations data from Central-East Europe, which can help to improve the historical magnetic field models. The main reason for this is that we don’t have any measurement from that region.

Furthermore the difference beetwen the angle of the section north and the declination data from gufm1 always 0.8-1°. Maybe there are systematical data error at that region.

Supported by the ÚNKP-23-6 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund.

How to cite: Koszta, B. and Timár, G.: A possible cartographical data source for historical magnetic field improvement: The direction of the section north of the Habsburg first military survey, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-582, https://doi.org/10.5194/egusphere-egu24-582, 2024.

EGU24-1988 | ECS | Posters on site | NP4.1

Predictive ability assessment of Bayesian Causal Reasoning (BCR) on runoff temporal series 

Santiago Zazo, José Luis Molina, Carmen Patino-Alonso, and Fernando Espejo

The alteration of traditional hydrological patterns due to global warming is leading to a modification of the hydrological cycle. This situation draws a complex scenario for the sustainable management of water resources. However, this issue offers a challenge for the development of innovative approaches that allow an in-depth capturing the logical temporal-dependence structure of these modifications to advance sustainable management of water resources, mainly through the reliable predictive models. In this context, Bayesian Causality (BC), addressed through Causal Reasoning (CR) and supported by a Bayesian Networks (BNs), called Bayesian Causal Reasoning (BCR) is a novel hydrological research area that can help identify those temporal interactions efficiently.

This contribution aims to assesses the BCR ability to discover the logical and non-trivial temporal-dependence structure of the hydrological series, as well as its predictability. For this, a BN that conceptually synthesizes the time series is defined, and where the conditional probability is propagated over the time throughout the BN through an innovative Dependence Mitigation Graph. This is done by coupling among an autoregressive parametric approach and causal model. The analytical ability of the BCR highlighted the logical temporal structure, latent in the time series, which defines the general behavior of the runoff. This logical structure allowed to quantify, through a dependence matrix which summarizes the strength of the temporal dependencies, the two temporal fractions that compose the runoff: one due to time (Temporally Conditioned Runoff) and one not (Temporally Non-conditioned Runoff). Based on this temporal conditionality, a predictive model is implemented for each temporal fraction, and its reliability is assessed from a double probabilistic and metrological perspective.

This methodological framework is applied to two Spanish unregulated sub-basins; Voltoya river belongs to Duero River Basin, and Mijares river, in the Jucar River Basin. Both cases with a clearly opposite temporal behavior, Voltoya independent and Mijares dependent, and with increasingly more problems associated with droughts.

The findings of this study may have important implications over the knowledge of temporal behavior of water resources of river basin and their adaptation. In addition, TCR and TNCR predictive models would allow advances in the optimal dimensioning of storage infrastructures (reservoirs), with relevant substantial economic/environmental savings. Also, a more sustainable management of river basins through more reliable control reservoirs’ operation is expected to be achieved. Finally, these results open new possibilities for developing predictive hydrological models within a BCR framework.

How to cite: Zazo, S., Molina, J. L., Patino-Alonso, C., and Espejo, F.: Predictive ability assessment of Bayesian Causal Reasoning (BCR) on runoff temporal series, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1988, https://doi.org/10.5194/egusphere-egu24-1988, 2024.

EGU24-3857 | ECS | Posters on site | NP4.1 | Highlight

Spatial-Temporal Analysis of Forest Mortality 

Sara Alibakhshi

Climate-induced forest mortality poses an increasing threat worldwide, which calls for developing robust approaches to generate early warning signals of upcoming forest state change. This research explores the potential of satellite imagery, utilizing advanced spatio-temporal indicators and methodologies, to assess the state of forests preceding mortality events. Traditional approaches, such as techniques based on temporal analyses, are impacted by limitations related to window size selection and detrending methods, potentially leading to false alarms. To tackle these challenges, our study introduces two new approaches, namely the Spatial-Temporal Moran (STM) and Spatial-Temporal Geary (STG) approaches, both focusing on local spatial autocorrelation measures. These approaches can effectively address the shortcomings inherent in traditional methods. The research findings were assessed across three study sites within California national parks, and Kendall's tau was employed to quantify the significance of false and positive alarms. To facilitate the measurement of ecosystem state change, trend estimation, and identification of early warning signals, this study also provides "stew" R package. The implications of this research extend to various groups, such as ecologists, conservation practitioners, and policymakers, providing them with the means to address emerging environmental challenges in global forest ecosystems.

How to cite: Alibakhshi, S.: Spatial-Temporal Analysis of Forest Mortality, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3857, https://doi.org/10.5194/egusphere-egu24-3857, 2024.

Iram Parvez1, Massimiliano Cannata2, Giorgio Boni1, Rossella Bovolenta1 ,Eva Riccomagno3 , Bianca Federici1

1 Department of Civil, Chemical and Environmental Engineering (DICCA), Università degli Studi di Genova, Via Montallegro 1, 16145 Genoa, Italy (iram.parvez@edu.unige.it,bianca.federici@unige.it, giorgio.boni@unige.it, rossella.bovolenta@unige.it).

2 Institute of Earth Sciences (IST), Department for Environment Constructions and Design (DACD), University of Applied Sciences and Arts of Southern Switzerland (SUPSI), CH-6952 Canobbio, Switzerland(massimiliano.cannata@supsi.ch).

3 Department of Mathematics, Università degli Studi di Genova, Via Dodecaneso 35, 16146 Genova, Italy(riccomag@dima.unige.it).

The deployment of hydrometeorological sensors significantly contributes to generating real-time big data. The quality and reliability of large datasets pose considerable challenges, as flawed analyses and decision-making processes can result. This research aims to address the issue of anomaly detection in real-time data by exploring machine learning models. Time-series data is collected from IstSOS - Sensor Observation Service, an open-source software that stores, collects and disseminates sensor data. The methodology consists of Gated Recurrent Units based on recurrent neural networks, along with corresponding prediction intervals, applied both to individual sensors and collectively across all temperature sensors within the Ticino region of Switzerland. Additionally, non-parametric methods like Bootstrap and Mean absolute deviation are employed instead of standard prediction intervals to tackle the non-normality of the data. The results indicate that Gated Recurrent Units based on recurrent neural networks, coupled with non-parametric forecast intervals, perform well in identifying erroneous data points. The application of the model on multivariate time series-sensor data establishes a pattern or baseline of normal behavior for the area (Ticino). When a new sensor is installed in the same region, the recognized pattern is used as a reference to identify outliers in the data gathered from the new sensor.

How to cite: Parvez, I.: Exploring Machine Learning Models to Detect Outliers in HydroMet Sensors, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4280, https://doi.org/10.5194/egusphere-egu24-4280, 2024.

EGU24-5268 | ECS | Orals | NP4.1

Unveiling Geological Patterns: Bayesian Exploration of Zircon-Derived Time Series Data 

Hang Qian, Meng Tian, and Nan Zhang

For its immunity to post-formation geological modifications, zircon is widely utilized as chronological time capsule and provides critical time series data potential to unravel key events in Earth’s geological history, such as supercontinent cycles. Fourier analysis, which assumes stationary periodicity, has been applied to zircon-derived time series data to find the cyclicity of supercontinents, and wavelet analysis, which assumes non-stationary periodicity, corroborates the results of Fourier Analysis in addition to detecting finer-scale signals. Nonetheless, both methods still prognostically assume periodicity in the zircon-derived time-domain data. To stay away from the periodicity assumption and extract more objective information from zircon data, we opt for a Bayesian approach and treat zircon preservation as a composite stochastic process where the number of preserved zircon grains per magmatic event obeys logarithmic series distribution and the number of magmatic events during a geological time interval obeys Poisson distribution. An analytical solution was found to allow us to efficiently invert for the number and distribution(s) of changepoints hidden in the globally compiled zircon data, as well as for the zircon preservation potential (encoded as a model parameter) between two neighboring changepoints. If the distributions of changepoints temporally overlap with those of known supercontinents, then our results serve as an independent, mathematically robust test of the cyclicity of supercontinents. Moreover, our statistical approach inherently provides a sensitivity parameter the tuning of which allows to probe changepoints at various temporal resolution. The constructed Bayesian framework is thus of significant potential to detect other types of trend swings in Earth’s history, such as shift of geodynamic regimes, moving beyond cyclicity detection which limits the application of conventional Fourier/Wavelet analysis.

How to cite: Qian, H., Tian, M., and Zhang, N.: Unveiling Geological Patterns: Bayesian Exploration of Zircon-Derived Time Series Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5268, https://doi.org/10.5194/egusphere-egu24-5268, 2024.

Semi-enclosed freshwater and brackish ecosystems, characterised by restricted water outflow and prolonged residence times, often accumulate nutrients, influencing their productivity and ecological dynamics. These ecosystems exhibit significant variations in bio-physical-chemical attributes, ecological importance, and susceptibility to human impacts. Untangling the complexities of their interactions remains challenging, necessitating a deeper understanding of effective management strategies adapted to their vulnerabilities. This research focuses on the bio-physical aspects, investigating the differential effects of spring and summer light on phytoplankton communities in semi-enclosed freshwater and brackish aquatic ecosystems.

Through extensive field sampling and comprehensive environmental parameter analysis, we explore how phytoplankton respond to varying light conditions in these distinct environments. Sampling campaigns were conducted at Müggelsee, a freshwater lake on Berlin's eastern edge, and Barther Bodden, a coastal lagoon northeast of Rostock on the German Baltic Sea coast, during the springs and summers of 2022 and 2023, respectively. Our analysis integrates environmental factors such as surface light intensity, diffuse attenuation coefficients, nutrient availability, water column dynamics, meteorological data, Chlorophyll-a concentration, and phytoplankton communities. Sampling encompassed multiple depths at continuous intervals lasting three days.

Preliminary findings underscore significant differences in seasonal light availability, with summer exhibiting extended periods of substantial light penetration. These variations seem to impact phytoplankton abundance and diversity uniquely in each ecosystem. While ongoing analyses are underway, early indications suggest distinct phytoplankton responses in terms of species composition and community structure, influenced by the changing light levels. In 2022 the clear water phase during spring indicated that bloom events have occurred under ice cover much earlier than spring, while in the summer there were weak and short-lived blooms of cyanobacteria. The relationship between nutrient availability and phytoplankton dynamics, however, remains uncertain according to our data.

This ongoing study contributes to understanding the role of light as a primary driver shaping phytoplankton community structures and dynamics in these environments.  Our research findings offer insights for refining predictive models, aiding in ecosystem-specific eutrophication management strategies, and supporting monitoring efforts of Harmful Algal Blooms.

How to cite: Kaharuddin, A. and Kaligatla, R.: Comparative Study of Spring and Summer Light Effects on Phytoplankton Communities in Semi-Enclosed Fresh- and Brackish Aquatic Ecosystems., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5733, https://doi.org/10.5194/egusphere-egu24-5733, 2024.

EGU24-6065 | ECS | Orals | NP4.1

Magnetospheric time history:  How much do we need for forecasting? 

Kendra R. Gilmore, Sarah N. Bentley, and Andy W. Smith

Forecasting the aurora and its location accurately is important to mitigate any potential harm to vital infrastructure like communications and electricity grid networks. Current auroral prediction models rely on our understanding of the interaction between the magnetosphere and the solar wind or geomagnetic indices. Both approaches do well in predicting but have limitations concerning forecasting (geomagnetic indices-based model) or because of the underlying assumptions driving the model (due to a simplification of the complex interaction). By applying machine learning algorithms to this problem, gaps in our understanding can be identified, investigated, and closed. Finding the important time scales for driving empirical models provides the necessary basis for our long-term goal of predicting the aurora using machine learning.

Periodicities of the Earth’s magnetic field have been extensively studied on a global scale or in regional case studies. Using a suite of different time series analysis techniques including frequency analysis and investigation of long-scale changes of the median/ mean, we examine the dominant periodicities of ground magnetic field measurements at selected locations. A selected number of stations from the SuperMAG network (Gjerloev, 2012), which is a global network of magnetometer stations across the world, are the focus of this investigation.

The periodicities retrieved from the different magnetic field components are compared to each other as well as to other locations. In the context of auroral predictions, an analysis of the dominating periodicities in the auroral boundary data derived from the IMAGE satellite (Chisham et al., 2022) provides a counterpart to the magnetic field periodicities.

Ultimately, we can constrain the length of time history sensible for forecasting.

How to cite: Gilmore, K. R., Bentley, S. N., and Smith, A. W.: Magnetospheric time history:  How much do we need for forecasting?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6065, https://doi.org/10.5194/egusphere-egu24-6065, 2024.

EGU24-6151 | Posters on site | NP4.1

Using information-theory metrics to detect regime changes in dynamical systems 

Javier Amezcua and Nachiketa Chakraborty

Dynamical systems can display a range of dynamical regimes (e.g. attraction to, fixed points, limit cycles, intermittency, chaotic behaviour) depending on the values of parameters in the system. In this work we demonstrate how non-parametric entropy estimation codes (in particular NPEET) based on the Kraskov method can be applied to find regime transitions in a 3D chaotic model (the Lorenz 1963 system) when varying the values of the parameters. These infromation-theory-based methods are simpler and cheaper to apply than more traditional metrics from dynamical systems (e.g. computation of Lyapunov exponents). The non-parametric nature of the method allows for handling long time series without a prohibitive computational burden. 

How to cite: Amezcua, J. and Chakraborty, N.: Using information-theory metrics to detect regime changes in dynamical systems, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6151, https://doi.org/10.5194/egusphere-egu24-6151, 2024.

EGU24-9367 | ECS | Orals | NP4.1

Fractal complexity evaluation of meteorological droughts over three Indian subdivisions using visibility Graphs 

Susan Mariam Rajesh, Muraleekrishnan Bahuleyan, Arathy Nair GR, and Adarsh Sankaran

Evaluation of scaling properties and fractal formalisms is one of the potential approaches for modelling complex series. Understanding the complexity and fractal characterization of drought index time series is essential for better preparedness against drought disasters. This study presents a novel visibility graph-based evaluation of fractal characterization of droughts of three meteorological subdivisions of India. In this method, the horizontal visibility graph (HVG) and Upside-down visibility graph (UDVG) are used for evaluating the network properties for different standardized precipitation index (SPI) series of 3, 6 and 12 month time scales representing short, medium and long term droughts. The relative magnitude of fractal estimates is controlled by the drought characteristics of wet-dry transitions. The estimates of degree distribution clearly deciphered the self-similar properties of droughts of all the subdivisions. For an insightful depiction of drought dynamics, the fractal exponents and spectrum are evaluated by the concurrent application of Sand Box Method (SBM) and Chhabra and Jenson Method (CJM). The analysis was performed for overall series along with the pre- and post-1976-77 Global climate shift scenarios. The complexity is more evident in short term drought series and UDVG formulations implied higher fractal exponents for different moment orders irrespective of drought type and locations considered in this study. Useful insights on the relationship between complex network and fractality are evolved from the study, which may help in improved drought forecasting. The visibility graph based fractality estimation evaluation is efficient in capturing drought and it has vast potential in the drought predictions in a changing environment.

Keywords:  Drought, Fractal, SPI, Visibility Graph

How to cite: Rajesh, S. M., Bahuleyan, M., Nair GR, A., and Sankaran, A.: Fractal complexity evaluation of meteorological droughts over three Indian subdivisions using visibility Graphs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9367, https://doi.org/10.5194/egusphere-egu24-9367, 2024.

EGU24-9537 | Posters on site | NP4.1

Wavelet-Induced Mode Extraction procedure: Application to climatic data 

Elise Faulx, Xavier Fettweis, Georges Mabille, and Samuel Nicolay

The Wavelet-Induced Mode Extraction procedure (WIME) [2] was developed drawing inspiration from Empirical Mode Decomposition. The concept involves decomposing the signal into modes, each presenting a characteristic frequency, using continuous wavelet transform. This method has yielded intriguing results in climatology [3,4]. However, the initial algorithm did not account for the potential existence of slight frequency fluctuations within a mode, which could impact the reconstruction of the original signal [4]. The new version (https://atoms.scilab.org/toolboxes/toolbox_WIME/0.1.0) now allows for the evolution of a mode in the space-frequency half-plane, thus considering the frequency evolution of a mode [2]. A natural application of this tool is in the analysis of Milankovitch cycles, where subtle changes have been observed throughout history. The method also refines the study of solar activity, highlighting the role of the "Solar Flip-Flop." Additionally, the examination of temperature time series confirms the existence of cycles around 2.5 years. It is now possible to attempt to correlate solar activity with this observed temperature cycle, as seen in speleothem records [1].

[1] Allan, M., Deliège, A., Verheyden, S., Nicolay S. and Fagel, N. Evidence for solar influence in a Holocene speleothem record, Quaternary Science Reviews, 2018.
[2] Deliège, A. and Nicolay, S., Extracting oscillating components from nonstationary time series: A wavelet-induced method, Physical Review. E, 2017.
[3] Nicolay, S., Mabille, G., Fettweis, X. and Erpicum, M., A statistical validation for the cycles found in air temperature data using a Morlet wavelet-based method, Nonlinear Processes in Geophysics, 2010.
[4] Nicolay, S., Mabille, G., Fettweis, X. and Erpicum, M., 30 and 43 months period cycles found in air temperature time series using the Morlet wavelet, Climate Dynamics, 2009.

How to cite: Faulx, E., Fettweis, X., Mabille, G., and Nicolay, S.: Wavelet-Induced Mode Extraction procedure: Application to climatic data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9537, https://doi.org/10.5194/egusphere-egu24-9537, 2024.

EGU24-10258 | Orals | NP4.1

New concepts on quantifying event data 

Norbert Marwan and Tobias Braun

A wide range of geoprocesses manifest as observable events in a variety of contexts, including shifts in palaeoclimate regimes, evolutionary milestones, tectonic activities, and more. Many prominent research questions, such as synchronisation analysis or power spectrum estimation of discrete data, pose considerable challenges to linear tools. We present recent advances using a specific similarity measure for discrete data and the method of recurrence plots for different applications in the field of highly discrete event data. We illustrate their potential for palaeoclimate studies, particularly in detecting synchronisation between signals of discrete extreme events and continuous signals, estimating power spectra of spiky signals, and analysing data with irregular sampling.

How to cite: Marwan, N. and Braun, T.: New concepts on quantifying event data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10258, https://doi.org/10.5194/egusphere-egu24-10258, 2024.

EGU24-10415 | ECS | Orals | NP4.1

Application of Transfer Learning techniques in one day ahead PV production prediction 

Marek Lóderer, Michal Sandanus, Peter Pavlík, and Viera Rozinajová

Nowadays photovoltaic panels are becoming more affordable, efficient, and popular due to their low carbon footprint. PV panels can be installed in many places providing green energy to the local grid reducing energy cost and transmission losses. Since the PV production is highly dependent on the weather conditions, it is extremely important to estimate expected output in advance in order to maintain energy balance in the grid and provide enough time to schedule load distribution. The PV production output can be calculated by various statistical and machine learning prediction methods. In general, the more data available, the more precise predictions can be produced. This poses a problem for recently installed PV panels for which not enough data has been collected or the collected data are incomplete. 

A possible solution to the problem can be the application of an approach called Transfer Learning which has the inherent ability to effectively deal with missing or insufficient amounts of data. Basically, Transfer Learning is a machine learning approach which offers the capability of transferring knowledge acquired from the source domain (in our case a PV panel with a large amount of historical data) to different target domains (PV panels with very little collected historical data) to resolve related problems (provide reliable PV production predictions). 

In our study, we investigate the application, benefits and drawbacks of Transfer Learning for one day ahead PV production prediction. The model used in the study is based on complex neural network architecture, feature engineering and data selection. Moreover, we focus on the exploration of multiple approaches of adjusting weights in the target model retraining process which affect the minimum amount of training data required, final prediction accuracy and model’s overall robustness. Our models use historical meteorological forecasts from Deutscher Wetterdienst (DWD) and photovoltaic measurements from the project PVOutput which collects data from installed solar systems across the globe. Evaluation is performed on more than 100 installed PV panels in Central Europe.

How to cite: Lóderer, M., Sandanus, M., Pavlík, P., and Rozinajová, V.: Application of Transfer Learning techniques in one day ahead PV production prediction, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10415, https://doi.org/10.5194/egusphere-egu24-10415, 2024.

EGU24-11897 | Posters on site | NP4.1

Results of joint processing of magnetic observatory data of international Intermagnet network in a unified coordinate system 

Beibit Zhumabayev, Ivan Vassilyev, Zhasulan Mendakulov, Inna Fedulina, and Vitaliy Kapytin

In each magnetic observatory, the magnetic field is registered in local Cartesian coordinate systems associated with the geographic coordinates of the locations of these observatories. To observe extraterrestrial magnetic field sources, such as the interplanetary magnetic field or magnetic clouds, a method of joint processing of data from magnetic observatories of the international Intermagnet network was implemented. In this method, the constant component is removed from the observation results of individual observatories, their measurement data is converted into the ecliptic coordinate system, and the results obtained from all observatories are averaged after the coordinate transformation.

The first data on joint processing of measurement results from the international network of Intermagnet magnetic observatories in the period before the onset of magnetic storms of various types, during these storms and after their end are presented. There is a significant improvement in the signal-to-noise ratio after combining the measurement results from all observatories, which makes it possible to isolate weaker external magnetic fields. A change in the shape of magnetic field variations is shown, which can provide new knowledge about the mechanism of development of magnetic storms.

How to cite: Zhumabayev, B., Vassilyev, I., Mendakulov, Z., Fedulina, I., and Kapytin, V.: Results of joint processing of magnetic observatory data of international Intermagnet network in a unified coordinate system, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11897, https://doi.org/10.5194/egusphere-egu24-11897, 2024.

We introduce the CLEAN algorithm to identify narrowband Ultra Low Frequency (ULF) Pc5 plasma waves in Earth’s magnetosphere. The CLEAN method was first used for constructing 2D images in astronomical radio interferometry but has since been applied to a huge range of areas including adaptation for time series analysis. The algorithm performs a nonlinear deconvolution in the frequency domain (equivalent to a least-squares in the time domain) allowing for identification of multiple individual wave spectral peaks within the same power spectral density. The CLEAN method also produces real amplitudes instead of model fits to the peaks and retains phase information. We applied the method to GOES magnetometer data spanning 30 years to study the distribution of narrowband Pc5 ULF waves at geosynchronous orbit. We found close to 30,0000 wave events in each of the vector magnetic field components in field-aligned coordinates. We discuss wave occurrence and amplitudes distributed in local time and frequency. The distribution of the waves under different solar wind conditions are also presented. With some precautions, which are applicable to other event identification methods, the CLEAN technique can be utilized to detect wave events and its harmonics in the magnetosphere and beyond. We also discuss limitations of the method mainly the detection of unrealistic peaks due to aliasing and Gibbs phenomena.

How to cite: Inceoglu, F. and Loto'aniu, P.: Using the CLEAN Algorithm to Determine the Distribution of Ultra Low Frequency Waves at Geostationary Orbit, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12928, https://doi.org/10.5194/egusphere-egu24-12928, 2024.

EGU24-12938 | Posters on site | NP4.1

Applying Multifractal Theory and Statistical Techniques for High Energy Volcanic Explosion Detection and Seismic Activity Monitoring in Volcanic Time Series 

Marisol Monterrubio-Velasco, Xavier Lana, Raúl Arámbula-Mendoza, and Ramón Zúñiga

Understanding volcanic activity through time series data analysis is crucial for uncovering the fundamental physical mechanisms governing this natural phenomenon.

In this study, we show the application of multifractal and fractal methodologies, along with statistical analysis, to investigate time series associated with volcanic activity. We aim to make use of these approaches to identify significant variations within the physical processes related to changes in volcanic activity. These methodologies offer the potential to identify pertinent changes preceding a high-energy explosion or a significant volcanic eruption.

In particular, we apply it to analyze two study cases. First, the evolution of the multifractal structure of volcanic emissions of low, moderate, and high energy explosions applied to Volcán de Colima (México years 2013-2015). The results contribute to obtaining quite evident signs of the immediacy of possible dangerous emissions of high energy, close to 8.0x10^8 J. Additionally, the evolution of the adapted Gutenberg-Richter seismic law to volcanic energy emissions contributes to confirm the results obtained using multifractal analysis. Secondly, we also studied the time series of the Gutenberg-Richter b-parameter of seismic activities associated with volcanic emissions in Iceland, Hawaii, and the Canary Islands, through the concept of Disparity (degree of irregularity), the fractal Hurst exponent, H, and several multifractal parameters. The results obtained should facilitate a better knowledge of the relationships between the activity of volcanic emissions and the corresponding related seismic activities.  

How to cite: Monterrubio-Velasco, M., Lana, X., Arámbula-Mendoza, R., and Zúñiga, R.: Applying Multifractal Theory and Statistical Techniques for High Energy Volcanic Explosion Detection and Seismic Activity Monitoring in Volcanic Time Series, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12938, https://doi.org/10.5194/egusphere-egu24-12938, 2024.

EGU24-13593 | ECS | Posters on site | NP4.1

Characterizing Uncertainty in Spatially Interpolated Time Series of Near-Surface Air Temperature 

Conor Doherty and Weile Wang

Spatially interpolated meteorological data products are widely used in the geosciences as well as disciplines like epidemiology, economics, and others. Recent work has examined methods for quantifying uncertainty in gridded estimates of near-surface air temperature that produce distributions rather than simply point estimates at each location. However, meteorological variables are correlated not only in space but in time, and sampling without accounting for temporal autocorrelation produces unrealistic time series and potentially underestimates cumulative errors. This work first examines how uncertainty in air temperature estimates varies in time, both seasonally and at shorter timescales. It then uses data-driven, spectral, and statistical methods to better characterize uncertainty in time series of estimated air temperature values. Methods for sampling that reproduce spatial and temporal autocorrelation are presented and evaluated. The results of this work are particularly relevant to domains like agricultural and ecology. Physical processes including evapotranspiration and primary production are sensitive to variables like near-surface air temperature, and errors in these important meteorological inputs accumulate in model outputs over time.

How to cite: Doherty, C. and Wang, W.: Characterizing Uncertainty in Spatially Interpolated Time Series of Near-Surface Air Temperature, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13593, https://doi.org/10.5194/egusphere-egu24-13593, 2024.

EGU24-13879 | ECS | Posters on site | NP4.1

Understanding the role of vegetation responses to drought in regulating autumn senescence 

Eunhye Choi and Josh Gray

Vegetation phenology is the recurring of plant growth, including the cessation and resumption of growth, and plays a significant role in shaping terrestrial water, nutrient, and carbon cycles. Changes in temperature and precipitation have already induced phenological changes around the globe, and these trends are likely to continue or even accelerate. While warming has advanced spring arrival in many places, the effects on autumn phenology are less clear-cut, with evidence for earlier, delayed, or even unchanged end of the growing season (EOS). Meteorological droughts are intensifying in duration and frequency because of climate change. Droughts intricately impact changes in vegetation, contingent upon whether the ecosystem is limited by water or energy. These droughts have the potential to influence EOS changes. Despite this, the influence of drought on EOS remains largely unexplored. This study examined moisture’s role in controlling EOS by understanding the relationship between precipitation anomalies, vegetation’s sensitivity to precipitation (SPPT), and EOS. We also assess regional variations in responses to the impact of SPPT on EOS.

The study utilized multiple vegetation and water satellite products to examine the patterns of SPPT in drought and its impact on EOS across aridity gradients and vegetation types. By collectively evaluating diverse SPPTs from various satellite datasets, this work offers a comprehensive understanding and critical basis for assessing the impact of drought on EOS. We focused on the Northern Hemisphere from 2000 to 2020, employing robust statistical methods. This work found that, in many places, there was a stronger relationship between EOS and drought in areas with higher SPPT. Additionally, a non-linear negative relationship was identified between EOS and SPPT in drier regions, contracting with a non-linear positive relationship observed in wetter regions. These findings were consistent across a range of satellite-derived vegetation products. Our findings provide valuable insights into the effects of SPPT on EOS during drought, enhancing our understanding of vegetation responses to drought and its consequences on EOS and aiding in identifying drought-vulnerable areas.

How to cite: Choi, E. and Gray, J.: Understanding the role of vegetation responses to drought in regulating autumn senescence, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13879, https://doi.org/10.5194/egusphere-egu24-13879, 2024.

EGU24-16981 | ECS | Orals | NP4.1

A machine-learning-based approach for predicting the geomagnetic secular variation 

Sho Sato and Hiroaki Toh

We present a machine-learning-based approach for predicting the geomagnetic main field changes, known as secular variation (SV), in a 5-year range for use for the 14th generation of International Geomagnetic Reference Field (IGRF-14). The training and test datasets of the machine learning (ML) models are geomagnetic field snapshots derived from magnetic observatory hourly means, and CHAMP and Swarm-A satellite data (MCM Model; Ropp et al., 2020). The geomagnetic field data are not used as-is in the original time series but were differenced twice before training. Because SV is strongly influenced by the geodynamo process occurring in the Earth's outer core, challenges still persist despite efforts to model and forecast the realistic nonlinear behaviors (such as the geomagnetic jerks) of the geodynamo through data assimilation. We compare three physics-uninformed ML models, namely, the Autoregressive (AR) model, Vector Autoregressive (VAR) model, and Recurrent Neural Network (RNN) model, to represent the short-term temporal evolution of the geomagnetic main field on the Earth’s surface. The quality of 5-year predictions is tested by the hindcast results for the learning window from 2004.50 to 2014.25. These tests show that the forecast performance of our ML model is comparable with that of candidate models of IGRF-13 in terms of data misfits after the release epoch (Year 2014.75). It is found that all three ML models give 5-year prediction errors of less than 100nT, among which the RNN model shows a slightly better accuracy. They also suggest that Overfitting to the training data used is an undesirable machine learning behavior that occurs when the RNN model gives accurate reproduction of training data but not for forecasting targets.

How to cite: Sato, S. and Toh, H.: A machine-learning-based approach for predicting the geomagnetic secular variation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16981, https://doi.org/10.5194/egusphere-egu24-16981, 2024.

EGU24-17344 | Posters on site | NP4.1

Introducing a new statistical theory to quantify the Gaussianity of the continuous seismic signal 

Éric Beucler, Mickaël Bonnin, and Arthur Cuvier

The quality of the seismic signal recorded at permanent and temporary stations is sometimes degraded, either abruptly or over time. The most likely cause is a high level of humidity, leading to corrosion of the connectors but environmental changes can also alter recording conditions in various frequency ranges and not necessarily for all three components in the same way. Assuming that the continuous seismic signal can be described by a normal distribution, we present a new approach to quantify the seismogram quality and to point out any time sample that deviates from this Gaussian assumption. To this end the notion of background Gaussian signal (BGS) to statistically describe a set of samples that follows a normal distribution. The discrete function obtained by sorting the samples in ascending order of amplitudes is compared to a modified probit function to retrieve the elements composing the BGS, and its statistical properties, mostly the Gaussian standard deviation, which can then differ from the classical standard deviation. Hence the ratio of both standard deviations directly quantifies the dominant gaussianity of the continuous signal and any variation reflects a statistical modification of the signal quality. We present examples showing daily variations in this ratio for stations known to have been affected by humidity, resulting in signal degradation. The theory developed can be used to detect subtle variations in the Gaussianity of the signal, but also to point out any samples that don't match the Gaussianity assumption, which can then be used for other seismological purposes, such as coda determination.

How to cite: Beucler, É., Bonnin, M., and Cuvier, A.: Introducing a new statistical theory to quantify the Gaussianity of the continuous seismic signal, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17344, https://doi.org/10.5194/egusphere-egu24-17344, 2024.

EGU24-17566 | ECS | Posters on site | NP4.1

Unveiling Climate-Induced Ocean Wave Activities Using Seismic Array Data in the North Sea Region 

Yichen Zhong, Chen Gu, Michael Fehler, German Prieto, Peng Wu, Zhi Yuan, Zhuoyu Chen, and Borui Kang

Climate events may induce abnormal ocean wave activities, that can be detected by seismic array on nearby coastlines. We collected long-term continuous array seismic data in the Groningen area and the coastal areas of the North Sea, conducted a comprehensive analysis to extract valuable climate information hidden within the ambient noise. Through long-term spectral analysis, we identified the frequency band ranging from approximately 0.2Hz, which appears to be associated with swell waves within the region, exhibiting a strong correlation with the significant wave height (SWH). Additionally, the wind waves with a frequency of approximately 0.4 Hz and gravity waves with periods exceeding 100 seconds were detected from the seismic ambient noise. We performed a correlation analysis between the ambient noise and various climatic indexes across different frequency bands. The results revealed a significant correlation between the North Atlantic Oscillation (NAO) Index and the ambient noise around 0.17Hz.

Subsequently, we extracted the annual variation curves of SWH frequency from ambient noise at each station around the North Sea and assembled them into a sparse spatial grid time series (SGTS). An empirical orthogonal function (EOF) analysis was conducted, and the Principal Component (PC) time series derived from the EOF analysis were subjected to a correlation analysis with the WAVEWATCH III (WW3) model simulation data, thereby confirming the wave patterns. Moreover, we conducted the spatial distribution study of SGTS. The spatial features revealed that the southern regions of the North Sea exhibit higher wind-wave energy components influenced by the Icelandic Low pressure system and topography, which explains the correlation between ambient noise in the region and the NAO index. Furthermore, spatial features disclosed a correlation between the first EOF mode of the North Sea ocean waves and the third mode of sea surface temperature anomalies. This research shows the potential of utilizing existing off-shore seismic monitoring systems to study global climate variation and physical oceanography.

How to cite: Zhong, Y., Gu, C., Fehler, M., Prieto, G., Wu, P., Yuan, Z., Chen, Z., and Kang, B.: Unveiling Climate-Induced Ocean Wave Activities Using Seismic Array Data in the North Sea Region, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17566, https://doi.org/10.5194/egusphere-egu24-17566, 2024.

EGU24-18061 | ECS | Orals | NP4.1

A new methodology for time-series reconstruction of global scale historical Earth observation data 

Davide Consoli, Leandro Parente, and Martijn Witjes

Several machine learning algorithms and analytical techniques do not allow gaps or non-values in input data. Unfortunately, earth observation (EO) datasets, such as satellite images, are gravely affected by cloud contamination and sensor artifacts that create gaps in the time series of collected images. This limits the usage of several powerful techniques for modeling and analysis. To overcome these limitations, several works in literature propose different imputation methods to reconstruct the gappy time series of images, providing complete time-space datasets and enabling their usage as input for many techniques.

However, among the time-series reconstruction methods available in literature, only a few of them are publicly available (open source code), applicable without any external source of data, and suitable for application to petabyte (PB) sized dataset like the full Landsat archive. The few methods that match all these characteristics are usually quite trivial (e.g. linear interpolation) and, as a consequence, they often show poor performance in reconstructing the images. 

For this reason, we propose a new methodology for time series reconstruction designed to match all these requirements. Like some other methods in literature, the new method, named seasonally weighted average generalization (SWAG), works purely on the time dimension, reconstructing the images working on each time series of each pixel separately. In particular, the method uses a weighted average of the samples available in the original time series to reconstruct the missing values. Enforcing the annual seasonality of each band as a prior, for the reconstruction of each missing sample in the time series a higher weight is given to images that are collected exactly on integer multiples of a year. To avoid propagation of land cover changes in future or past images, higher weights are given to more recent images. Finally, to have a method that respects causality, only images from the past of each sample in the time series are used.

To have computational performance suitable for PB sized datasets the method has been implemented in C++ using a sequence of fast convolution methods and Hadamard products and divisions. The method has been applied to a bimonthly aggregated version of the global GLAD Landsat ARD-2 collection from 1997 to 2022, producing a 400 terabyte output dataset. The produced dataset will be used to generate maps for several biophysical parameters, such as Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), normalized difference water index (NDWI) and bare soil fraction (BSF). The code is available as open source, and the result is fully reproducible.

References:

Potapov, Hansen, Kommareddy, Kommareddy, Turubanova, Pickens, ... & Ying  (2020). Landsat analysis ready data for global land cover and land cover change mapping. Remote Sensing, 12(3), 426.

Julien, & Sobrino (2019). Optimizing and comparing gap-filling techniques using simulated NDVI time series from remotely sensed global data. International Journal of Applied Earth Observation and Geoinformation, 76, 93-111.

Radeloff, Roy, Wulder, Anderson, Cook, Crawford, ... & Zhu (2024). Need and vision for global medium-resolution Landsat and Sentinel-2 data products. Remote Sensing of Environment, 300, 113918.

How to cite: Consoli, D., Parente, L., and Witjes, M.: A new methodology for time-series reconstruction of global scale historical Earth observation data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18061, https://doi.org/10.5194/egusphere-egu24-18061, 2024.

EGU24-18197 | ECS | Orals | NP4.1 | Highlight

The regularity of climate-related extreme events under global warming 

Karim Zantout, Katja Frieler, and Jacob Schewe and the ISIMIP team

Climate variability gives rise to many different kinds of extreme impact events, including heat waves, crop failures, or wildfires. The frequency and magnitude of such events are changing under global warming. However, it is less known to what extent such events occur with some regularity, and whether this regularity is also changing as a result of climate change. Here, we present a novel method to systematically study the time-autocorrelation of these extreme impact events, that is, whether they occur with a certain regularity. In studies of climate change impacts, different types of events are often studied in isolation, but in reality they interact. We use ensembles of global biophysical impact simulations from the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) driven with climate models to assess current conditions and projections. The time series analysis is based on a discrete Fourier transformation that accounts for the stochastic fluctuations from the climate model. Our results show that some climate impacts, such as crop failure, indeed exhibit a dominant frequency of recurrence; and also, that these regularity patterns change over time due to anthropogenic climate forcing.

How to cite: Zantout, K., Frieler, K., and Schewe, J. and the ISIMIP team: The regularity of climate-related extreme events under global warming, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18197, https://doi.org/10.5194/egusphere-egu24-18197, 2024.

EGU24-18210 | ECS | Posters on site | NP4.1

Long-term vegetation development in context of morphodynamic processes since mid-19th century 

Katharina Ramskogler, Moritz Altmann, Sebastian Mikolka-Flöry, and Erich Tasser

The availability of comprehensive aerial photography is limited to the mid-20th century, posing a challenge for quantitatively analyzing long-term surface changes in proglacial areas. This creates a gap of approximately 100 years, spanning the end of the Little Ice Age (LIA). Employing digital monoplotting and historical terrestrial images, our study reveals quantitative surface changes in a LIA lateral moraine section dating back to the second half of the 19th century, encompassing a total study period of 130 years (1890 to 2020). With the long-term analysis at the steep lateral moraines of Gepatschferner (Kauner Valley, Tyrol, Austria) we aimed to identify changes in vegetation development in context with morphodynamic processes and the changing climate.

In 1953, there was an expansion in the area covered by vegetation, notably encompassing scree communities, alpine grassland, and dwarf shrubs. However, the destabilization of the system after 1980, triggered by rising temperatures and the resulting thawing of permafrost, led to a decline in vegetation cover by 2020. Notably, our observations indicated that, in addition to morphodynamic processes, the overarching trends in temperature and precipitation exerted a substantial influence on vegetation development. Furthermore, areas with robust vegetation cover, once stabilised, were reactivated and subjected to erosion, possibly attributed to rising temperatures post-1980.

This study demonstrates the capability of historical terrestrial images to enhance the reconstruction of vegetation development in context with morphodynamics in high alpine environments within the context of climate change. However, it is important to note that long-term mapping of vegetation development through digital monoplotting has limitations, contingent on the accessibility and quality of historical terrestrial images, as well as the challenges posed by shadows in high alpine regions. Despite these limitations, this long-term approach offers fundamental data on vegetation development for future modelling efforts.

How to cite: Ramskogler, K., Altmann, M., Mikolka-Flöry, S., and Tasser, E.: Long-term vegetation development in context of morphodynamic processes since mid-19th century, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18210, https://doi.org/10.5194/egusphere-egu24-18210, 2024.

EGU24-19601 | ECS | Posters on site | NP4.1

Discrimination of  geomagnetic quasi-periodic signals by using SSA Transform 

Palangio Paolo Giovanni and Santarelli Lucia

Discrimination of  geomagnetic quasi-periodic signals by using SSA Transform

  • Palangio1, L. Santarelli 1

1Istituto Nazionale di Geofisica e Vulcanologia L’Aquila

3Istituto Nazionale di Geofisica e Vulcanologia Roma

 

Correspondence to:  lucia.santarelli@ingv.it

 

Abstract

In this paper we present an application of  the SSA Transform to the detection and reconstruction of  very weak geomagnetic signals hidden in noise. In the SSA Transform  multiple subspaces are used for representing and reconstructing   signals and noise.  This analysis allows us to reconstruct, in the time domain, the different harmonic components contained in the original signal by using  ortogonal functions. The objective is to identificate the dominant  subspaces that can be attributed to the  signals and the subspaces that can be attributed to the noise,  assuming that all these  subspaces are orthogonal to each other, which implies that the  signals and noise  are independent of one another. The subspace of the signals is mapped simultaneously on several spaces with a lower dimension, favoring the dimensions that best discriminate the patterns. Each subspace of the signal space is used to encode different subsets of functions having common characteristics, such as  the same periodicities. The subspaces  identification was performed by using singular value decomposition (SVD) techniques,  known as  SVD-based identification methods  classified in a subspace-oriented scheme.The  quasi-periodic variations of geomagnetic field  has been investigated in the range of scale which span from 22 years to 8.9 days such as the  Sun’s polarity reversal cycle (22 years), sun-spot cycle (11 years), equinoctial effect (6 months), synodic rotation of the Sun (27 days) and its harmonics. The strength of these signals vary from fractions of a nT to tens of nT. Phase and frequency variability of these cycles has been evaluated from the range of variations in the geomagnetic field recorded at middle latitude place (covering roughly 4.5 sunspot cycles). Magnetic data recorded at L'Aquila Geomagnetic observatory (geographic coordinates: 42° 23’ N, 13° 19’E, geomagnetic coordinates: 36.3° N,87°.2 E, L-shell=1.6) are used from 1960 to 2009.

 

 

How to cite: Paolo Giovanni, P. and Lucia, S.: Discrimination of  geomagnetic quasi-periodic signals by using SSA Transform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19601, https://doi.org/10.5194/egusphere-egu24-19601, 2024.

EGU24-22262 | ECS | Posters on site | NP4.1

Temporal Interpolation of Sentinel-2 Multispectral Time Series in Context of Land Cover Classification with Machine Learning Algorithms 

Mate Simon, Mátyás Richter-Cserey, Vivien Pacskó, and Dániel Kristóf

Over the past decades, especially since 2014, large quantities of Earth Observation (EO) data became available in high spatial and temporal resolution, thanks to ever-developing constellations (e.g.: Sentinel, Landsat) and open data policy. However, in the case of optical images, affected by cloud coverage and the spatially changing overlap of relative satellite orbits, creating temporally generalized and dense time series by using only measured data is challenging, especially when studying larger areas.

Several papers investigate the question of spatio-temporal gap filling and show different interpolation methods to calculate missing values corresponding to the measurements. In the past years more products and technologies have been constructed and published in this field, for example Copernicus HR-VPP Seasonal Trajectories (ST) product.  These generalized data structures are essential to the comparative analysis of different time periods or areas and improve the reliability of data analyzing methods such as Fourier transform or correlation. Temporally harmonized input data is also necessary in order to improve the results of Machine Learning classification algorithms such as Random Forest or Convolutional Neural Networks (CNN). These are among the most efficient methods to separate land cover categories like arable lands, forests, grasslands and built-up areas, or crop types within the arable category.

This study analyzes the efficiency of different interpolation methods on Sentinel-2 multispectral time series in the context of land cover classification with Machine Learning. We compare several types of interpolation e.g. linear, cubic and cubic-spline and also examine and optimize more advanced methods like Inverse Distance Weighted (IDW) and Radial Basis Function (RBF). We quantify the accuracy of each method by calculating mean square error between measured and interpolated data points. The role of interpolation of the input dataset in Deep Learning (CNN) is investigated by comparing Overall, Kappa and categorical accuracies of land cover maps created from only measured and interpolated time series. First results show that interpolation has a relevant positive effect on accuracy statistics. This method is also essential in taking a step towards constructing robust pretrained Deep Learning models, transferable between different time intervals and agro-ecological regions.

The research has been implemented with the support provided by the Ministry of Culture and Innovation of Hungary from the National Research, Development and Innovation Fund, financed under the KDP-2021 funding scheme.

 

Keywords: time series analysis, Machine Learning, interpolation, Sentinel

How to cite: Simon, M., Richter-Cserey, M., Pacskó, V., and Kristóf, D.: Temporal Interpolation of Sentinel-2 Multispectral Time Series in Context of Land Cover Classification with Machine Learning Algorithms, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22262, https://doi.org/10.5194/egusphere-egu24-22262, 2024.

ESSI2 – Data, Software and Computing Infrastructures across Earth and Space Sciences

International standards are important for communication of geoscience information across borders and between countries, and in particular for addressing multinational and global issues such as climate change, resilience to natural hazards and sustainable resource extraction.

The Commission for the Management and Application of Geoscience Information (CGI) is the International Union of Geological Sciences’ (IUGS) commission for developing, managing and governing geoscience data models and vocabularies, amongst other standards. CGI undertakes its activities through working groups arranged around different types of information standards and through its governing council. With members drawn from most continents and regions, their collaboration results in standards that are internationally applicable; the GeoSciML and EarthResourceML are examples of data models developed with multinational cooperation and applied in global and regional applications such as OneGeology and Minerals4EU. The data models are supported by geoscience vocabularies developed and published by CGI.

CGI as an IUGS commission has both a unique position and an opportunity around governance of global geoscience information standards. Through its enduring status, CGI is not bound by finite and funding-constrained projects. The contributing projects can be very influential for standards development but sustaining standards after project cessation can be difficult.

To date, CGI has tended to manage and govern standards it has developed or co-developed. The opportunity for CGI going forward is to take more of a leadership role across IUGS and internationally-focussed societies and agencies to host, manage and/or promote their standards. With growing expectations of FAIR Principles adherence across the global geoscience community, CGI, as a commission of the IUGS, can help enable their implementation through providing enduring, authorised, geoscience information standards and services.

How to cite: Rattenbury, M.: International Geoscience Information Standards, Management and Governance, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1701, https://doi.org/10.5194/egusphere-egu24-1701, 2024.

EGU24-5509 | ECS | Orals | ESSI2.1

Predicting Mining activities dynamics in Ghana: A Fusion of Social beliefs and Remote Sensing 

Glorie Metsa WOWO, Pierre C. Sibiry Traore, Vijaya Joshi, Paul Cohen, Janet Mumo Mutuku, Mihai Surdeanu, Maria Alexeeva Zupon, and Keith Alcock

Forests and arable land in Ghana face a significant threat due to the mechanisation and increase of illegal gold mining (galamsey). These directly affect local’s diets and nutrition, impacting communities that rely on forest resources and small-scale farming for sustenance. In regional systems, accurately predicting future outcomes is a crucial task, with applications ranging from environmental management to agriculture. Recent advances in participatory science and modelling have highlighted the potential of building collective models based on the knowledge and beliefs of local populations who interact with the system. Such approaches have provided more accurate estimations of future outcomes compared to traditional expert-driven methods. Under the HEURISTICS project, we are exploring the causal efficacy of local knowledge, beliefs, and attitudes in local communities' decisions on transition to agriculture, forest and Galamsey. Using unsupervised and supervised classification, different land uses and land covers (LULC) are classified from 2017 - 2023, including: Forest, Croplands, Settlements, Water, and Galamsey/Mining Sites. In addition to EO data, open-source data from OpenStreetMap are extracted, providing valuable information on roads, rivers, water streams, and administrative boundaries. To further enrich the data, machine reading models are employed to extract beliefs from articles, ranking them based on relevance to topics such as Galamsey, mining, cities, and settlements. Additionally, we leverage text data to map public sentiment towards mining activities. By analysing the origin and sentiment of sentences, we gain insights into how people perceive different areas and how these perceptions are connected to land use. Further analysis examines the factors influencing sentiment scores, including mining proximity, boundary effects, and authority influence. Grid-level sentiment maps reveal nuanced spatial patterns and highlight areas potentially impacted by mining. We also predict future mining trajectories by using machine learning models trained on historical  2017-2022 text and mining data that allows us to make a prediction for the 2023 year, and to identify key  factors correlated with the mining activities. An R-squared value of 0.94 was obtained, indicating that our approach explains 94% of the variance in mining proportion. 

Keywords: Land use prediction, mining activity, Galamsey, sentiment analysis, remote sensing, Ghana.

How to cite: Metsa WOWO, G., Sibiry Traore, P. C., Joshi, V., Cohen, P., Mumo Mutuku, J., Surdeanu, M., Alexeeva Zupon, M., and Alcock, K.: Predicting Mining activities dynamics in Ghana: A Fusion of Social beliefs and Remote Sensing, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5509, https://doi.org/10.5194/egusphere-egu24-5509, 2024.

EGU24-7376 | Posters on site | ESSI2.1

Building A Trustworthy Data Repository for Australian Geoscience Research Communities 

Anusuriya Devaraju, Woodman Stuart, Sam Bradley, Vincent Fazio, Neda Taherifar, Benyamin Motevalli, Jens Klump, Lesley Wyborn, and Rebecca Farrington

AuScope Australia (https://www.auscope.org.au/) focuses on delivering data, services, and tools to support the future research of the Australian geoscience research community. As a component of the “Data Lens” of the AuScope Downward-Looking Telescope, the AuScope Discovery Portal (https://portal.auscope.org.au/) harvests metadata from affiliated data catalogues to support more comprehensive access. Over time, it has become apparent that the data repositories offered by the AuScope partners and universities need to be improved for curating data from the AuScope projects. Many do not provide structured metadata suitable for harvesting into the Discovery Portal and offer limited data discovery and retention support. Consequently, most institutional data repositories do not support AuScope’s strategy toward making all data from AuScope projects or data collected with AuScope-funded instruments compliant with the FAIR (Findable, Accessible, Interoperable, and Reusable) Guiding Principles for both humans and machines (Wilkinson et al., 2016). 

The goal of the AuScope Data Repository is to preserve and offer continued access to data from its communities (e.g., NCRIS-funded data projects and Australian Geoscience research communities) working on fundamental geoscience questions and grand challenges, including climate change, natural resources security and natural hazards. Datasets submitted to the repository will be made openly available where appropriate, with attributions to promote open science. The repository is essential for geoscience research innovation in support of the AuScope 5-Year Investment Plan and Australian Academy of Science Decadal plan for Australian Geoscience: Our Planet, Australia's Future. 

This presentation will cover the repository’s scope and design, including non-technical (e.g., practices, governance, and engagement) and practical aspects of the repository (e.g., persistence identification, data discovery, interoperability, workflows and security and architectural requirements). We discuss the key considerations when setting up a data repository for scientific communities. The first release of the repository is now available online to gather early feedback from the selected AuScope data providers and project affiliates. We will summarize the presentation with the next steps of the development process, including engagement activities and documented data practices.

How to cite: Devaraju, A., Stuart, W., Bradley, S., Fazio, V., Taherifar, N., Motevalli, B., Klump, J., Wyborn, L., and Farrington, R.: Building A Trustworthy Data Repository for Australian Geoscience Research Communities, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7376, https://doi.org/10.5194/egusphere-egu24-7376, 2024.

Data about the Earth are too difficult to access, making EO exploitation inaccessible for non-experts and tedious for experts. Reasons are manifold, from intrinsic complexity to technically overloaded handling where data are presented more in a generator-centric than a user-centric manner. Under the headline of Analysis-Ready Data (ARD) significant research is going on to find ways of stripping off unneccesary burden from services.

While progress is being made on improving metadata, such as by CEOS, the data perspective still is underrepresented. With our research we aim at contributing to closing this gap in particular on gridded data, so rasters and datacubes. Starting point is the data and processing model the ISO/OGC coverage standards offer; in these ecosystems, three use cases are inspected: determination of service quality parameters, automated data fusion, and ML. We find that some ARD aspects are covered, but we also spot several issues that deserve investigation and standardization effort. Broadly, these fall into the following categories:

  • conceptual clarification, ex: pixel-in-center / pixel-in-corner;
  • enhancing existing (and otherwise proven) standards, ex: establish a framework for units of measure amenable to autoamtic conversion, similar to coordinate reference systems;
  • improved standards governance, ex: avoiding competing standards known to be not interoperable.

In our talk we present results achieved from work in OGC Testbed-19 and EU FAIRiCUBE. We discuss gaps found and present suggestions for improvement towards easier and more reliable consumption of EO data by humans and machines.

 

How to cite: Baumann, P.: Analysis-Ready EO Data: A Standards-Centric Perspective, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8323, https://doi.org/10.5194/egusphere-egu24-8323, 2024.

EGU24-11538 | Posters on site | ESSI2.1

Methodology for building interoperable Research Infrastructures: Example from Geo-INQUIRE project 

Jan Michalek, Daniele Bailo, Javier Quinteros, Otto Lange, Rossana Paciello, Valerio Vinciarelli, and Kety Giuliacci and the Geo-INQUIRE project team

Building and establishing a fully interoperable Research Infrastructure (RI) allowing smooth data exchange across various scientific domains is a dream of many researchers and data managers around the world. Indeed, there are various pathways to achieve it and many attempts are currently being made. Unfortunately, there is no single approach to fit all RIs since there is natural heterogeneity that encompasses the different phases of the data lifecycle, starting from data collection until data interpretation spreading across various domains. 

In the current contribution we present an example methodology for building an interoperable RI which is being applied within the Geosphere INfrastructures for QUestions into IntegratedREsearch project (Geo-INQUIRE, https://www.geo-inquire.eu/). Geo-INQUIRE was launched in October 2022 and comprises a unique consortium of 51 partners, including national research institutes, universities, national geological surveys, and European consortia. A portfolio of 150 Virtual Access (VA) and Transnational Access (TA, both virtual and on-site) installations will be offered to the scientific community  across domain barriers, especially the land-sea-atmosphere environments, including EPOS, EMSO, ARISE, ECCSEL, and ChEESE RIs. The great challenge is to find common grounds across the domains and define principles general enough for all participating RIs, though detailed enough allowing useful interoperability. The example methodology we are presenting here has been developed within implementation of European Plate Observing System (EPOS; an European Research Infrastructure Consortium since 2018) and relies on data access provision through web-services (APIs). For achieving true interoperability it is not enough to have robust and efficient web-services but also having rich metadata description provided by sufficient metadata models compliant with FAIR principals is critical. The methodology therefore envisages data integration through an approach that includes web-services, rich metadata and semantics. These three key elements are monitored and evaluated through a set of criteria distributed into three levels which were put together into an Implementation Level Matrix (ILM) to understand the landscape of service provision. This ILM serves as a tool for capturing the changing maturity of installations/services and tracking their readiness for interoperable integration into RIs.

The interoperability starts at the data provider level and therefore domain-specific coordination and common development efforts are important.  A good example of this is the work done in seismology. After a long process of joint work with experts in data FAIRness a group of data centres submitted to the International Federation (FDSN) a proposal to update their community guidelines about DOIs for seismic networks. This was iterated, discussed, and finally adopted as a community standard in December 2023. However, the guidelines include not only topics exclusive to seismology, but also the aim of being as FAIR as possible from the point of view of a multi-disciplinary perspective. The challenge to extrapolate this to the RIs and data providers from other disciplines taking part in the project and adapt them to their reality is an on-going effort within the project. 

How to cite: Michalek, J., Bailo, D., Quinteros, J., Lange, O., Paciello, R., Vinciarelli, V., and Giuliacci, K. and the Geo-INQUIRE project team: Methodology for building interoperable Research Infrastructures: Example from Geo-INQUIRE project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11538, https://doi.org/10.5194/egusphere-egu24-11538, 2024.

EGU24-12644 | Orals | ESSI2.1

FAIR and open data: state of affairs for seismological networks and infrastructures globally 

Helle Pedersen, Elizabetta D'Anastasio, Jerry Carter, Rob Casey, Jonathan Hanson, Florian Haslinger, Javier Quinteros, and Lesley Wyborn

Over the past years the awareness of the importance and usefulness of Globally Unique Persistent Resolvable identifiers (GUPRIs), appropriate licenses, further standardisation of metadata, and general adherence to the FAIR principles has increased significantly in the international seismological community. One important milestone was the introduction of seismological network identifiers as an FDSN recommendation in 2014 and recently updated at the end of 2023. Further advances were made, or are in development, in connection with the establishment of more formalised research infrastructures like EPOS in Europe, Auscope in Australia, and the reorganisation of IRIS and UNAVCO to Earthscope in the U.S, as well as national and international initiatives such as for example COPDESS, and RDA. In Europe the developments in seismology have taken place within or with close links to projects such as  Geo-INQUIRE, ChEESE, Digital Twin of GEOphysical extremes (DT–GEO), and building on achievements and tools from projects (e.g. FAIRsFAIR), and in general as part of the European Research Infrastructure environment.

In this contribution we reflect on the current state of the use of identifiers, application of licenses and other improvements in the FAIRness of seismological data, products and services, focusing on FDSN and ORFEUS/EIDA, Earthscope, Auscope, and the GFZ/Geofon and RESIF data centers.

Experience gained with DOIs as seismological network identifiers is conclusive in terms of acceptability of associating a DOI at network level: more than 70% of FDSN registered networks now have a DOI. On the contrary, the correct citation based on the DOI is only gradually gaining traction in scientific publications, due to a combination of slow uptake by researchers and the difficulty of the scientific journals to set up automatic or semi-automatic checking procedures.  Additional challenges remain e.g. when trying to implement identifiers for data collections and/or downstream products that properly support reproducibility of scientific workflows. A simple collection of DOIs would not be enough to describe a user defined dataset, that is characterized by a much finer granularity. Therefore, the evaluation of other alternatives, like the inclusion of time-stamped query related to a new DOI describing such a dataset could be needed. This could be used for small datasets with data from different sources, or even for ML/AI training sets, defined as a collection of networks.

Some standardisation and best practices have emerged with regard to licensing of seismological data and products, in particular the use of by attribution licenses like CC-BY. A common and harmonised understanding of legal implications, intellectual property, and consequences of specific licenses, however, seems still quite a bit away.

Implementing FAIRness, and then measuring it or even reporting the level of FAIRness to funding agencies has met with some success at least in specific initiatives or through specific projects. One noteworthy development is the introduction of FAIR Implementation Profiles (FIPS) that allow a quantitative assessment of the achieved FAIRness.   

How to cite: Pedersen, H., D'Anastasio, E., Carter, J., Casey, R., Hanson, J., Haslinger, F., Quinteros, J., and Wyborn, L.: FAIR and open data: state of affairs for seismological networks and infrastructures globally, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12644, https://doi.org/10.5194/egusphere-egu24-12644, 2024.

EGU24-12690 | Orals | ESSI2.1

Model Atlas of the Earth (M@TE): advancing the interoperability of digital twins 

Sara Polanco, Dan Sandiford, Xiaodong Qin, Andres Rodríguez Corcho, Lauren Ilano, Christopher Alfonso, Julian Giordani, Ben Mather, Nigel Rees, and Rebecca Farrington

The field of numerical modeling of Earth’s systems is rapidly growing and it is instrumental for addressing the current environmental crisis. Such models often require specialized computational resources (HPC), can take days-to-weeks to run, and produce large volumes of heterogeneous output data. The lack of curation of these numerical models and community standards hinders our ability to access, interpret and build on published numerical models. Here, we present a first-of-its-kind open science framework that aims to establish a community practice to increase the usefulness of numerical modeling outputs and leverage computational resources. M@TE provides a digital platform that encapsulates the entire model development process: from setup, to model output, and analysis. This supports discovery, data preservation, reproducibility, and reuse, with flexibility for users with different levels of expertise. M@TE has a human-browsable, machine-searchable, user-friendly front end (https://mate.science/ ), and a back-end GitHub organization (https://github.com/ModelAtlasofTheEarth) and model output repository targeted to expert users. Contributions to M@TE are handled by GitHub automation workflows that guide contributors through the process of documenting their models, ensuring that they meet community standards, validating metadata and creating DOIs.  M@TE provides a platform for a much wider appreciation of Earth processes and numerical modeling, particularly to industry stakeholders, professional geoscientists and educators. Furthermore, M@te is creating a single platform that will advance the interoperability of digital twins required to address the current environmental crisis.

How to cite: Polanco, S., Sandiford, D., Qin, X., Rodríguez Corcho, A., Ilano, L., Alfonso, C., Giordani, J., Mather, B., Rees, N., and Farrington, R.: Model Atlas of the Earth (M@TE): advancing the interoperability of digital twins, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12690, https://doi.org/10.5194/egusphere-egu24-12690, 2024.

EGU24-13221 | Posters on site | ESSI2.1

NASA’s Satellite Needs Working Group Management Office: Developing Solutions in an Agile, Open Science Environment 

Katrina Virts, Pontus Olofsson, Sean Gregory, Jeanne' le Roux, and Rahul Ramachandran

Every two years, the National Aeronautics and Space Administration (NASA) leads an assessment of U.S. Federal civilian agency Earth observation needs submitted through the Satellite Needs Working Group (SNWG) survey.  In four survey cycles beginning in 2016, nearly 400 high-priority satellite needs have been identified, spanning Earth Science and representing a wide variety of potential applications for Earth observation data.

During each assessment cycle, new data products and services (i.e., solutions) that meet the needs of multiple agencies are identified and proposed for funding.  The majority of solutions being developed or currently operational are global in scope, including harmonized land surface reflectance data from Landsat and Sentinel-2; composites of cloud properties derived from MODIS, VIIRS, and five geostationary satellites; dynamic surface water extent and land surface disturbance products derived from multiple optical and radar missions; a suite of low-latency products from the ICESat-2 mission; and a soil moisture product derived from the upcoming NISAR mission.

The SNWG Management Office, within the Earth Action element of NASA’s Earth Science Division, manages both the biennial SNWG survey assessment and the development of solutions starting at full capacity with the 2020 cycle.  Each solution project is required to align with NASA’s open science policy, including developing source code in an open code repository, having an open-source software license, and making all data freely available via NASA’s Earthdata website.  The presentation will include an overview of the SNWG process, its emphasis on open science, and highlight several operational solutions freely available to the global research and applications communities.

How to cite: Virts, K., Olofsson, P., Gregory, S., le Roux, J., and Ramachandran, R.: NASA’s Satellite Needs Working Group Management Office: Developing Solutions in an Agile, Open Science Environment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13221, https://doi.org/10.5194/egusphere-egu24-13221, 2024.

EGU24-13755 | Posters on site | ESSI2.1

Building the software, data and training foundations to support Australia’s climate simulator (ACCESS-NRI) 

Kelsey Druken, Aidan Heerdegen, Romain Beucher, Roger Edberg, Natalia Bateman, Victoria Allen, Claire Carouge, Martin Dix, Heidi Nettelbeck, and Andy Hogg

ACCESS-NRI is a national research infrastructure (NRI) established to support the Australian Community Climate and Earth System Simulator, or ACCESS. The ACCESS suite of software and data outputs are essential tools used to simulate past and future climate, weather and Earth systems and to support research and decision making within Australia. ACCESS-NRI's mission is to build an open collaborative infrastructure that will accelerate research in Earth system, climate and weather modelling as well as enable new research not currently possible. The facility brings together skills in software development, high-performance computing, data management and analysis to enhance the ACCESS modelling framework, making it easier to use and lowering the barrier for innovation.   

To improve usability and uptake of this complex modelling framework, the software, data and training program is comprised of 3 teams that focus on providing open and transparent processes for the development, release, and user training of the ACCESS models, tools and data. This presentation will provide an overview of the program’s establishment over the first 18 months as a new facility. This includes enabling reproducible build and deployment workflows, supporting tools to analyse and evaluate model output, data management and development of training tools and materials. Core to all capabilities across the organisation is openness and community engagement in our open development and decision-making processes. This presentation will also discuss the critical infrastructure foundations we have built to support this engagement and advance the impact of ACCESS. 

ACCESS became a National Research Infrastructure (NRI) facility through funding from the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) Program and officially launched in June 2022. This facility is a major collaborative undertaking between the Bureau of Meteorology, CSIRO and five Australian universities, in collaboration with national and international partners. 

How to cite: Druken, K., Heerdegen, A., Beucher, R., Edberg, R., Bateman, N., Allen, V., Carouge, C., Dix, M., Nettelbeck, H., and Hogg, A.: Building the software, data and training foundations to support Australia’s climate simulator (ACCESS-NRI), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13755, https://doi.org/10.5194/egusphere-egu24-13755, 2024.

Open Science is a paradigm shift for science; open practices can remove barriers to sharing science and increase its reproducibility and transparency. As society faces global, interdisciplinary challenges like climate change, open scientific research – including open data, software, workflows, samples – is more important than ever. AI and ML methods are increasingly used in the Earth, space, and environmental sciences to investigate these large challenges, and analysis-ready data for use in these methods is predicated on open, FAIR principles for data sharing. However, maximizing FAIR-ness and ensuring research is ‘as open as possible’ across the many Earth, space, and environmental science disciplines encompasses a range of challenges, including lacking infrastructure, incentives, resources, and guidance for all participants in the research ecosystem. We believe that societies, including AGU, EGU, JpGU, communities like ESIP, and beyond, have a significant role to play in catalyzing collaboration to overcome this range of challenges. Here, we share progress on collaborative efforts involving AGU, partners, and the broader community to solve these challenges, including developing discipline-specific guidance for researchers on data and software sharing, training for researchers in leading open science practices, and guidance and partnerships for publishers interested in implementing FAIR and CARE in the publishing workflow. We look forward to partnering on global collaborations to advocate for our researchers and the open future of science.

How to cite: Stall, S. and Vrouwenvelder, K.: Open collaboration for open science: a global perspective on disciplinary challenges in the Earth, space, and environmental sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13944, https://doi.org/10.5194/egusphere-egu24-13944, 2024.

EGU24-14071 | ECS | Orals | ESSI2.1

Community-Driven Development of Tools to Improve AI-Readiness of the Open Environmental Data 

Yuhan "Douglas" Rao, Rob Redmon, and Eric Khin

As artificial intelligence (AI) and machine learning (ML) gaining broad interests in the Earth and space science community, the demand for AI-ready data can support the development of responsible AI/ML applications with open environmental data. Through a broad community collaboration under Earth Science Information Partners, we have developed an AI-readiness checklist as a community guideline for the development of AI-ready open environmental data. The checklist was initially based on an early draft of an AI-ready matrix developed by the OSTP Open Science Sub-committee but has been modified notably based on feedback from data users and AI/ML practitioners. The current version of the AI-readiness checklist can be used to holistically assess the documentation, quality, access, and pre-processing of a given dataset. The AI-readiness assessment result can be then summarized into a data card that provides human-readable metrics to assist users in determining if the dataset meets the user's need for their AI/ML development. The next milestone of this community-driven effort is to develop a community-driven convention by building on the existing data conventions and standards to fill the data management gap to support AI-ready data management. In this presentation, we will also showcase a collection of AI-ready climate datasets applying the AI-readiness checklist and data card concept to support AI/ML applications in climate sciences. The AI-readiness development process requires active community engagement with data repositories, domain scientists, and AI/ML practitioners to establish a flexible framework to ensure the rapid evolution of AI/ML technologies can be addressed in modern data management.

How to cite: Rao, Y. "., Redmon, R., and Khin, E.: Community-Driven Development of Tools to Improve AI-Readiness of the Open Environmental Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14071, https://doi.org/10.5194/egusphere-egu24-14071, 2024.

EGU24-14110 | ECS | Orals | ESSI2.1

Overcoming challenges to data and software attribution throughout the research workflow: a publisher perspective 

Kristina Vrouwenvelder, Natalie Raia, and Shelley Stall

In an era when research is increasingly interdisciplinary and when the use of AI and ML methods across the Earth, space, and environmental science continues to grow, the importance of FAIR data is clear. The community has worked to elevate the importance of outputs beyond the research article; publishers, including the American Geophysical Union, now require authors to share data and code alongside their research articles in an effort to increase the transparency and reproducibility of science and enhance data and software reuse. Yet before we can fully realize the potential of FAIR data and software policies to advance the scientific enterprise, significant challenges remain. These include the need for education and efficient workflows to ease the burden on the researcher and increase their uptake of open, FAIR practices at the point of data and/or software publication as well as at article publication. Here, we will share a progress report from AGU on the effect of publisher policies on data and software sharing and discuss work by AGU and the broader community to break down barriers for the researcher and to ensure that data and software creators receive appropriate attribution for their work. 

How to cite: Vrouwenvelder, K., Raia, N., and Stall, S.: Overcoming challenges to data and software attribution throughout the research workflow: a publisher perspective, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14110, https://doi.org/10.5194/egusphere-egu24-14110, 2024.

Geohazards and risks increase worldwide rapidly due to continuing urbanization, climate change, and high-risk critical distributed infrastructure. The longest modern instrumental records of earthquakes cover less than 100 years, while recurrence intervals of large earthquakes are hundreds of years or more. Increasingly dense observations and physics-based simulations empowered by supercomputing provide pathways for overcoming the lack of data and elucidating spatiotemporal patterns that extend our knowledge beyond sporadic case studies and average statistical laws - however, are typically challenging to integrate.

Digital Twins are emerging in Solid Earth Science, allowing curiosity-driven science to test scientific hypotheses against observations over ranges of space-time scales not accessible for laboratory and field observations. The results can clarify processes leading to large earthquakes, improve our forecasting ability, and enhance the general understanding of earthquakes and faults.

In this presentation, I will highlight Geo-INQUIRE (www.geo-inquire.eu), DT-Geo (www.dt-geo.eu) and ChEESE-2P (www.cheese2.eu), European projects that aim to overcome cross-domain barriers and will exploit innovative data management techniques, modeling and simulation methods, developments in AI and big data, and extend existing data infrastructures to disseminate these resources to the wider scientific community. Specifically, we will provide and enhance access to selected key data, products, and services, enabling the dynamic processes within the geosphere to be monitored and modeled at new levels of spatial and temporal detail and precision.

How to cite: Gabriel, A.-A.: Enabling curiosity-driven science and digital twins for earthquake physics in the exascale era, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14360, https://doi.org/10.5194/egusphere-egu24-14360, 2024.

EGU24-15541 | Posters on site | ESSI2.1

GEOROC 2.0: A Globally Connected Geochemical Database to Facilitate Interdisciplinary, Data-Driven Research 

Gerhard Wörner, Marthe Klöcking, Adrian Sturm, Bärbel Sarbas, Leander Kallas, Stefan Möller-McNett, Kirsten Elger, Daniel Kurzawe, and Matthias Willbold

The GEOROC database is a leading, open-access source of geochemical and isotopic datasets that provides access to curated compilations of igneous and metamorphic rock and mineral compositions from >20,600 publications. It is a data resource that supports and facilitates hundreds of new research publications each year across multiple geoscientific and related disciplines.

This presentation is to “advertise” to the geochemical community this data product and our ongoing efforts to improve the service by providing FAIR (findable, accessible, interoperable and reusable) geochemical data. We will also describe some recently published research where authors were using large geochemical data compilations such as GEOROC and PetDB for innovative approaches in digital geochemistry.

To further support such research also in the future, the Digital Geochemical Data Infrastructure (DIGIS) initiative is developing a new IT and data infrastructure for GEOROC 2.0 to enable modern solutions to data submission, discovery and access. GEOROC data compilations are made accessible via a web search interface and an API. In addition, DIGIS maintains a direct data pipeline between the data compiled in GEOROC and the EarthChem Portal. Hence, GEOROC represents one of six different geochemical databases that can be queried and accessed synchronously within the EarthChem Portal. The DIGIS infrastructure further partners with GFZ Data Services, a domain repository for geosciences data, hosted at GFZ, offering data publication services with assigned digital object identifiers (DOI). Individual researchers can directly submit their geochemical datasets to the repository (using the EarthChem Data Templates) where they are archived for the long term. Regular thematic snapshots of the GEOROC synthesis database are archived in the GRO.data repository of the University of Göttingen.

Part of this cooperation is the development of standardised vocabularies and data reporting to enhance interoperability of geo- and cosmochemical data systems. Harmonized data entry for the GEOROC, PetDB and Astromat synthesis databases will avoid duplication and ensure consistent data and metadata. With these efforts, and as a participant of the OneGeochemistry(1,2) initiative, DIGIS is working towards the goal of globally harmonised geochemical data to enable interdisciplinary, data-driven research.

 

References

Klöcking, M. et al. (2023). Community recommendations for geochemical data, services and analytical capabilities in the 21st century. In Geochimica et Cosmochimica Acta (Vol. 351, pp. 192–205).

Prent, A. et al. (2023) Innovating and Networking Global Geochemical Data Resources Through OneGeochemistry. Elements 19, Issue 3, pp. 136–137.

How to cite: Wörner, G., Klöcking, M., Sturm, A., Sarbas, B., Kallas, L., Möller-McNett, S., Elger, K., Kurzawe, D., and Willbold, M.: GEOROC 2.0: A Globally Connected Geochemical Database to Facilitate Interdisciplinary, Data-Driven Research, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15541, https://doi.org/10.5194/egusphere-egu24-15541, 2024.

EGU24-17059 | Posters on site | ESSI2.1

A GeoPackage implementation for the new version of Earthquake Environmental Effects Catalogue (EEE Catalogue) in the context of GeoSciencesIR project 

Paolo Lino Manganello, Giuseppe Castorina, Maria Pia Congi, Anna Maria Blumetti, and Luca Guerrieri

The Earthquake Environmental Effects Catalogue (EEE Catalogue) represents a global database of environmental and geological effects induced by recent, historical and paleoearthquakes. These coseismic effects include among others surface faulting, ground subsidence, ground cracks, slope movements, liquefaction, tsunamis, hydrogeological anomalies. The observed relevance of earthquake environmental effects as a major source of damage, in addition to vibratory ground motion, confirms how the knowledge of this type of effects can be essential for seismic hazard and intensity assessment. Therefore, the EEE Catalogue can be considered as an helpful tool for land planning, particularly in areas of high seismicity. 
The first version of EEE Catalogue was launched in 2011 and was designed in PHP format. The structure of this catalogue is based on collecting data at three different levels of increasing detail, corresponding to three different tables: Earthquake, Locality, Site. The Earthquake features provide general information on the seismic event. The Locality features contain information on the features of a specific locality where some coseismic effects have occurred. The Site features provide information at the site of each earthquake environmental effect, including detailed characteristics on the type of earthquake. The structure of the catalogue is completed by Country features, consisting of point geometry.
The focus of this work is the new release of the EEE Catalogue to populate the Research Infrastructure of GeoSciences IR. The GeoSciences IR aims to create a research infrastructure for the Italian Network of Geological Surveys (RISG), coordinated by the Geological Survey of Italy (ISPRA). In this framework, for the GeoSciences IR is mandatory to be in compliance with FAIR (Findability, Accessibility, Interoperability, Reuse) principles to improve data sharing. Data that become part of the infrastructure must be harmonized in accordance with the INSPIRE Directive and FAIR principles.
A GeoPackage implementation  of the EEE Catalogue has been carried out. GeoPackage represents an open format for geospatial information developed by the Open Geospatial Consortium (OGC). This format has several advantages including platform independence, versatile data support, large storage capacity, and maintenance of relations between database tables.
The EEE Catalogue structure has been modified and improved. Furthermore, because of the complexity of the dataset and the need to enter new records into the database, new code lists were implemented, whereby a specific database field is filled by a list of predefined attributes. The main challenge of the new release is the semantic standardization on the basis of standard vocabularies (e.g., INSPIRE, GeoSCiML).

How to cite: Manganello, P. L., Castorina, G., Congi, M. P., Blumetti, A. M., and Guerrieri, L.: A GeoPackage implementation for the new version of Earthquake Environmental Effects Catalogue (EEE Catalogue) in the context of GeoSciencesIR project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17059, https://doi.org/10.5194/egusphere-egu24-17059, 2024.

EGU24-17356 | Orals | ESSI2.1

How do the CMIP7 infrastructure plans support the implementation of the IPCC data guidelines? 

Martina Stockhause, Matthew Mizielinski, Anna Pirani, Lina Sitz, Alessandro Spinuso, Mudathir Abdallah, Jesbin Baidya, Paul Durack, and Daniel Ellis

The Intergovernmental Panel on Climate Change (IPCC) regularly assesses a wide range of research results related to climate change reaching from physical sciences to economic and social sciences to provide policy makers with options for combatting the challenges of climate change. The IPCC authors analyze data across multiple domains and from multiple sources.

The IPCC data guidelines enhance the transparency of IPCC outcomes by ensuring that  figure creation is traceable, citing input data and long-term preserving data and software. The related data and metadata requested from the authors includes detailed information on datasets used in every figure, for which standardized machine-accessible and -readable information needs to be supported by the input data providers.

An example of an important input data provider is the Coupled Model Intercomparison Project (CMIP), which has continuously improved its standards and data infrastructure to keep track with the significant increase in the scale of the project over recent  phases. CMIP defines a set of standards including vocabularies for controlled metadata fields, e.g. variable and experiment names, along with the data itself and its structure. A set of infrastructure services provide access to data, through the Earth System Grid Federation (ESGF), description of the climate models used and known errata, through ES-DOC services, and data citation information including data usage in derivative data sets and published papers where known. 

The contribution will discuss the diverse data-workflows of the IPCC authors and the ways that the CMIP infrastructure supports them. Authors access data from the primary data portals of the  ESGF, but also from secondary data portals (Copernicus, Pangeo, Climate4Impact) or local data pools hosted by national institutions. The IPCC authors have faced a number of challenges, including accessing data citation and model description information together with the data, and in identifying new dataset versions with significant changes. With the IPCC’s plan to utilize provenance records in AR7 to gather all information requested by the IPCC data guidelines, machine-readable information accessible through the file headers becomes essential. 

The entire IPCC AR7 data workflow needs to be supported by tools: the figure creation by the IPCC authors, the report editing process by the TSU and the curation of the CMIP7 input data subset used and the intermediate and final datasets created by various DDC Partners, including bi-directional references between outputs. Virtual workspaces such as CEDA and  DKRZ provided for the authors in AR6, which give access to their data pools and common software packages like the ESMValtool, can support the authors in the preparation of their figures and the provision of the requested documentation and provenance information. A dedicated Figure Manager will play a central role in managing the report figures and supporting the overall data workflow.  Ultimately, lowering the burden for the authors, the TSU staff and the DDC Partners. This timely gathered information can then be analyzed and used for a harmonization of dataset version usage across the chapters and reports.

How to cite: Stockhause, M., Mizielinski, M., Pirani, A., Sitz, L., Spinuso, A., Abdallah, M., Baidya, J., Durack, P., and Ellis, D.: How do the CMIP7 infrastructure plans support the implementation of the IPCC data guidelines?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17356, https://doi.org/10.5194/egusphere-egu24-17356, 2024.

EGU24-19102 | ECS | Posters on site | ESSI2.1

NSSDC's Practices toward FAIR Data and AI for Space Science 

Xiaoyan Hu, Ziming Zou, Jizhou Tong, and Qi Xu

The age of big science is leading to the fact that many of today's important scientific problems and grand human challenges call for major breakthroughs through interdisciplinary joint research. Space science research and innovative applications are facing such a situation. Open science and artificial intelligence enable a new era in the space science research and application, offering new opportunities as well as challenges, such as the absence of data governance theories and standards, data quality and interoperability to be improved, and insufficient supply of data & intelligence-driven analysis models and tools.

To make good use of large-scale space science research data and effectively support across-domain joint research, the Chinese National Space Science Data Center (NSSDC), in conjunction with several universities and research institutions, has carried out a series of practices on FAIR data implementation and AI for space science, contributing to the development of a new generation of open research infrastructure.

On data governance and stewardship side, NSSDC actively promotes the FAIR principles in China's space science satellite missions and large-scale ground-based observation network projects, develops a theoretical model of scientific data governance and a set of data standards. For intelligent data application, NSSDC is exploring the development of AI-ready space science big data along with the development of intelligent analysis tools and models for automatic target identification, feature extraction, correlation and causal analysis, and event evolution prediction. In this process, we found that many of these AI-ready demands coincide with FAIR principles. How to achieve AI-ready and FAIRness are two closely related goals. In fact, both need to deal with the scale disaster and dimensional disaster of domain data, to enhance the openness of scientific resources including data, models, software and scientific workflows, to adapt to the tendency of significantly increasing machine participation in the scientific research process, and to address the complexity puzzle of frontier scientific problems. Through the fusion, integration and efficient interconnection of these scientific resources, an open scientific infrastructure that supports cross-domain and cross-platform data discovery, access, analysis and mining has been established, effectively supporting joint innovation for major scientific issues. Currently, NSSDC is also exploring connections and interactions with scientific infrastructures in other related disciplines, such as astronomy, high-energy physics and Earth system science.

How to cite: Hu, X., Zou, Z., Tong, J., and Xu, Q.: NSSDC's Practices toward FAIR Data and AI for Space Science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19102, https://doi.org/10.5194/egusphere-egu24-19102, 2024.

EGU24-19771 | Posters on site | ESSI2.1

Facilitate the reuse of data from public authorities in research  

Hannes Thiemann, Ivonne Anders, and Martin Schupfner

Governmental organizations collect and manage diverse data types at various levels to fulfill their official duties. This includes geographic, environmental, meteorological, population, health, traffic, transport, financial, and economic data. Traditionally, access to such data was restricted, but over the past decade, there has been a global shift towards more open data policies, influenced in part by directives like GeoIDG, the PSI directive, and INSPIRE. In Germany, federal states and their offices have also embraced open data policies, with some data being made publicly available (Open Government Data) through portals like Destatis or GDI-DE. This data serves multiple purposes, such as identifying locations, analyzing environmental trends, traffic planning, health service planning, and more. Public authorities' data is increasingly utilized for scientific investigations, yet the full potential remains untapped, particularly for large datasets. Despite the high quality of governmental data, further alignment with FAIR principles (Findable, Accessible, Interoperable, and Reusable) is necessary to enhance its efficiency for reuse in research. Privacy regulations and legal frameworks may impose limitations, necessitating data anonymization or adherence to modern data standards. Nevertheless, governmental data remains a valuable resource contributing significantly to expanding knowledge across scientific disciplines.

In a pilot project funded by the NFDI4Earth, in a collaboration, the German national meteorological service (DWD) and the German Climate Computing Centre (DKRZ) aimed to facilitate access to data from public authorities, increase data visibility, as well as the number of users from different disciplines, and make these data available in standardised and FAIR formats for easy use in research but also for other public applications. As an example, the COSMO-REA6 reanalysis dataset from DWD (Kaspar et al. 2020) was selected, crucial for climate modeling, analyses, and energy applications in Europe. The standardization process involved mapping public authority standards to domain-specific standards in climate research, requiring close collaboration between DWD and DKRZ. After detailed curation and quality checks, the dataset was made accessible through the ESGF infrastructure and long-term archived in the WDCC, addressing licensing and authorship considerations.

The project's insights and lessons learned were incorporated into a blueprint, providing guidance on making data from other authorities accessible and usable for both research and the public. Overall, the entire process can be divided into 5 sub-steps: (1) determination and classification of the need, (2) survey of the feasibility, (3) implementation, (4) feedback and follow-up, (5) dissemination. This blueprint outlines generalizable steps and aspects applicable across domains and collaborators, offering a framework for optimizing the use of governmental data in diverse fields.


References: 

Kaspar, F., et al., 2020: Regional atmospheric reanalysis activities at Deutscher Wetterdienst: review of evaluation results and application examples with a focus on renewable energy, Adv. Sci. Res., 17, 115–128, https://doi.org/10.5194/asr-17-115-2020, 2020. 



How to cite: Thiemann, H., Anders, I., and Schupfner, M.: Facilitate the reuse of data from public authorities in research , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19771, https://doi.org/10.5194/egusphere-egu24-19771, 2024.

EGU24-20634 | Posters virtual | ESSI2.1

Fostering Open Science in Earth Data Science Research: Insights from Earthdata Forum 

Hazem Mahmoud, Renee Key, Crystal Gummo, Brennan Bunch, Pauline Detweiler, Susan Kizer, John Kusterer, Matthew Tisdale, and Jeff Walter

In the dynamic landscape of Earth Science research, the promotion of open science principles is paramount for advancing knowledge and collaboration. The Earthdata Forum is an actively maintained and operational user forum for all participating National Aeronautics and Space Administration (NASA) Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs), and the Global Change Master Directory (GCMD). The Forum serves as a cross-DAAC platform from which user communities can obtain authoritative information relating to NASA Earth Science. This abstract explores the role of the Earthdata Forum forum.earthdata.nasa.gov as a pivotal platform in fostering open science within the Earth Science community. The platform serves as a hub for researchers to actively engage in discussions, share datasets, and collaboratively tackle challenges in the field.

 

Key aspects discussed include the platform's contribution to data accessibility, collaboration, and knowledge sharing. Forum.earthdata.nasa.gov provides a space where researchers transparently ask questions, discuss methodologies, share insights, and seek advice from a vibrant community. The resulting collaborative environment not only facilitates the exchange of ideas but also bolsters the collective knowledge base.

 

The abstract also delves into the significance of community engagement within the platform, emphasizing how active participation contributes to the ethos of open science. Furthermore, discussions on open-source tools, policy considerations, and the sharing of educational resources underscore the multifaceted role of forum.earthdata.nasa.gov in advancing open science principles.

 

As we navigate the evolving landscape of Earth Science research, understanding the impact of platforms like the Earthdata Forum on promoting transparency, accessibility, and collaboration becomes crucial. This abstract provides insights into the platform's role in nurturing an open science culture and its implications for the broader scientific community.

 

How to cite: Mahmoud, H., Key, R., Gummo, C., Bunch, B., Detweiler, P., Kizer, S., Kusterer, J., Tisdale, M., and Walter, J.: Fostering Open Science in Earth Data Science Research: Insights from Earthdata Forum, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20634, https://doi.org/10.5194/egusphere-egu24-20634, 2024.

Interdisciplinary science missions rely on the ability to combine data from across many research domains. Convergence of data can be achieved through adoption of the FAIR principles for data assets, making them Findable, Accessible, Interoperable and Reusable. In order to make data FAIR beyond a limited number of researchers, a broader research community has to declare which schemas, data standards, protocols and other resources are used for metadata and data. These resources, when published, are FAIR Enabling Resources (FERs). Listing which FERs are used to make a dataset FAIR helps the community towards interoperability between datasets. FAIR Implementation Profiles (FIPs) list FERs for each FAIR principle through a systematic question and answer based form and can be the basis for comparing FERs used in different data assets.

Through comparison of different communities’ FIPs, mappings and crosswalks can be developed between datasets, resulting in interoperability between datasets. Employing a FIP comparison strategy enables a group to grow the FAIR data asset size. Comparing FIPs with regards to a specific community can help grow it in both size and complexity, adding additional community members and their related interoperable datasets. FAIRness here evolves both on data asset size as on the community complexity level.

Elaborating on this; intercommunity agreement on FER usage, or the development of mappings and crosswalks between FERs, increases the communities FAIRness, growing its complexity and size. Growth of FAIR data assets can be achieved when multiple datasets use the same FERs and become a FAIR data collection. Additionally, complexity of the FAIR community goes hand in hand with growth of the FAIR data asset as multiple groups are generally involved in the collation of multiple datasets. FAIRness also increases if FERs are aligned for data types from different instruments, resulting in their various methodologies also becoming interoperable. With FAIRness increasing between methodologies the community complexity generally increases as for the combining of datasets.

Here we will present key outcomes from the WorldFAIR Geochemistry Work Package on how FAIRness of a community and its constituent data assets can evolve along three pathways.  FAIRness can be increased for the community (complexity), for data assets (size) and between methodologies or (sub)disciplines with FIPs as a means to document FERs used for community, data or methodologies in a structured manner, the comparative FIPs approach can form the basis for convergence and FAIR evolution on either of the three pathways.

How to cite: Prent, A. and Farrington, R.: FAIR Convergence using FAIR Implementation Profiles and the FAIR evolution pathways concept: lessons learned from the WorldFAIR Geochemistry Work Package , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21810, https://doi.org/10.5194/egusphere-egu24-21810, 2024.

EGU24-1042 | ECS | Posters on site | ESSI2.2

Assessing Job Wrapping as an Strategy for Workflow Optimization on Shared HPC Platforms 

Manuel Giménez de Castro Marciani, Gladys Utrera, Miguel Castrillo, and Mario C. Acosta

Experimenting with modern ESM inherently requires a workflow organization to handle the multiple steps comprising of, but not limited to, execution, data governance, cleaning, and coordinating multiple machines. And for climate experiments, due to long scale of the simulations, workflows are even more critical. The community has thoroughly proposed enhancements for reducing the runtime of the models, but long has overlooked the time to response, which also takes into account the queue time. And, that is what we aim to optimize by wrapping jobs, which would otherwise be submitted individually, onto a single one. The intricate three-way interaction of the HPC system usage, scheduler policy, and user's past usage is the main challenge addressed here to analyze the impact of wrapping jobs.

How to cite: Giménez de Castro Marciani, M., Utrera, G., Castrillo, M., and Acosta, M. C.: Assessing Job Wrapping as an Strategy for Workflow Optimization on Shared HPC Platforms, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1042, https://doi.org/10.5194/egusphere-egu24-1042, 2024.

EGU24-2152 | Posters on site | ESSI2.2

Collaboratively developing workflows at the BSC-ES 

Gilbert Montané Pinto, Eric Ferrer, Miriam Olid, Alejandro Garcia, Genís Bonet, and Amirpasha Mozaffari

At the Earth Sciences department of the Barcelona Supercomputing Center (BSC-ES) a variety of workflows are run for many different purposes like executing climate and atmospheric simulations, data downloading or performance evaluation. This implies having to deal with many different processes and time scales in different environments and machines.

To help conduct all these complex tasks, the Autosubmit workflow manager is used in the whole department as a unique framework. The fact that Autosubmit has been fully developed at the BSC-ES has led to the adoption of a co-design procedure between users, workflow developers and Autosubmit developers to fulfill the day-to-day department needs. The synergy and close collaboration among them allows the workflow engineers to gather the specific user requirements that later become new Autosubmit features available to everyone. Thanks to this continuous interaction at all levels, an efficient and very adaptable system could have been achieved, perfectly aligned with the constantly evolving user needs.

Here this collaborative strategy is presented from the workflow development point of view. Some real use cases and practical examples are used to show the positive impact it had in different operational and research projects, demonstrating how it can help in the achievement of high scientific productivity.

How to cite: Montané Pinto, G., Ferrer, E., Olid, M., Garcia, A., Bonet, G., and Mozaffari, A.: Collaboratively developing workflows at the BSC-ES, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2152, https://doi.org/10.5194/egusphere-egu24-2152, 2024.

EGU24-2533 | ECS | Posters on site | ESSI2.2

A workflow for the Climate Digital Twin 

Aina Gaya-Àvila, Leo Arriola i Meikle, Francesc Roura Adserias, Bruno De Paula Kinoshita, Daniel Beltrán Mora, Rohan Ahmed, Miguel Andrés-Martínez, and Miguel Castrillo

The escalating intricacy of climate models and the demand for high-resolution temporal and spatial data need the development of advanced workflows to effectively manage the complexities associated with a Climate Digital Twin. The designed workflow, tailored to meet these challenges, is model-agnostic, allowing for simulations across various models, such as IFS-NEMO, IFS-FESOM, and ICON. Notably, its adaptability extends to diverse High-Performance Computing environments, facilitated by the containerization of data consumers. 


A user-friendly configuration structure is implemented, providing scientists with a simplified interface that conceals the inherent complexity of the model during simulations. Additionally, the workflow includes immediate and continuous data processing, promoting scalability in temporal and spatial resolution. This approach ensures the efficient handling of intricate climate models, meeting the demands for high-resolution temporal and spatial data, while enhancing user accessibility and adaptability across different computational environments. 


Furthermore, the workflow, which uses Autosubmit as the workflow manager, ensures the traceability and reproducibility of the experiments, allowing for the tracking of processes and ensuring the ability to reproduce results accurately. Finally, the workflow allows for the aggregation of tasks into larger jobs, reducing queue times on shared machines and optimizing resource usage.

How to cite: Gaya-Àvila, A., Arriola i Meikle, L., Roura Adserias, F., De Paula Kinoshita, B., Beltrán Mora, D., Ahmed, R., Andrés-Martínez, M., and Castrillo, M.: A workflow for the Climate Digital Twin, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2533, https://doi.org/10.5194/egusphere-egu24-2533, 2024.

EGU24-3295 | ECS | Posters on site | ESSI2.2

Reproducing computational processes in service-based geo-simulation experiments 

Zhiyi Zhu and Min Chen

Geo-simulation experiments (GSEs) are experiments allowing the simulation and exploration of Earth’s surface (such as hydrological, geomorphological, atmospheric, biological, and social processes and their interactions) with the usage of geo-analysis models (hereafter called ‘models’). Computational processes represent the steps in GSEs where researchers employ these models to analyze data by computer, encompassing a suite of actions carried out by researchers. These processes form the crux of GSEs, as GSEs are ultimately implemented through the execution of computational processes. Recent advancements in computer technology have facilitated sharing models online to promote resource accessibility and environmental dependency rebuilding, the lack of which are two fundamental barriers to reproduction. In particular, the trend of encapsulating models as web services online is gaining traction. While such service-oriented strategies aid in the reproduction of computational processes, they often ignore the association and interaction among researchers’ actions regarding the usage of sequential resources (model-service resources and data resources); documenting these actions can help clarify the exact order and details of resource usage. Inspired by these strategies, this study explores the organization of computational processes, which can be extracted with a collection of action nodes and related logical links (node-link ensembles). The action nodes are the abstraction of the interactions between participant entities and resource elements (i.e., model-service resource elements and data resource elements), while logical links represent the logical relationships between action nodes. In addition, the representation of actions, the formation of documentation, and the reimplementation of documentation are interconnected stages in this approach. Specifically, the accurate representation of actions facilitates the correct performance of these actions; therefore, the operation of actions can be documented in a standard way, which is crucial for the successful reproduction of computational processes based on standardized documentation. A prototype system is designed to demonstrate the feasibility and practicality of the proposed approach. By employing this pragmatic approach, researchers can share their computational processes in a structured and open format, allowing peer scientists to re-execute operations with initial resources and reimplement the initial computational processes of GSEs via the open web.

How to cite: Zhu, Z. and Chen, M.: Reproducing computational processes in service-based geo-simulation experiments, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3295, https://doi.org/10.5194/egusphere-egu24-3295, 2024.

EGU24-5115 | Orals | ESSI2.2

Advancing polar airborne geophysics data management at the UK Polar Data Centre  

Alice Fremand, Julien Bodart, Tom Jordan, Peter Fretwell, and Alexander Tate

In the last 50 years, the British Antarctic Survey (BAS, https://www.bas.ac.uk/) has been a key player in acquiring airborne magnetic, gravity and radio-echo sounding data in Antarctica. These data have been central to many studies of the past, present and future evolution of the Antarctic Ice Sheet but until recently they were not accessible to the community.

In the last three years, the UK Polar Data Centre (https://www.bas.ac.uk/data/uk-pdc/) has made considerable efforts to standardise these datasets to comply with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles and develop the Polar Airborne Geophysics Data Portal (https://www.bas.ac.uk/project/nagdp/). Workflows from collection to publication have been updated, data formats standardised, and Jupyter Notebooks created to improve reuse and comply with the needs of the scientific community [1].

Following this experience and to promote open access, the UK Polar Data Centre led the management of 60 years of international Antarctic ice thickness data through the Bedmap3 project (https://www.bas.ac.uk/project/bedmap/), an international project supported by the Scientific Committee of Antarctic Science. This time, it’s 80+ million points of ice thickness, ice surface and bed elevation from 270+ surveys collected from 50+ international partners that have been standardised and assimilated in the Bedmap data portal (https://bedmap.scar.org/) [2].

Today, airborne data are acquired using new types of platforms including uncrewed aerial systems (UAV) adding new challenges and opportunities to set up new standards and data management practices.

As part of this presentation, we will present the different workflows and data management practices that we are developing to make Antarctic science open and FAIR.

[1] Frémand, A. C., Bodart, J. A., Jordan, T. A., Ferraccioli, F., Robinson, C., Corr, H. F. J., Peat, H. J., Bingham, R. G., and Vaughan, D. G.: British Antarctic Survey's aerogeophysical data: releasing 25 years of airborne gravity, magnetic, and radar datasets over Antarctica, Earth Syst. Sci. Data, 14, 3379–3410, https://doi.org/10.5194/essd-14-3379-2022 , 2022.

[2] Frémand, A. C., Fretwell, P., Bodart, J., Pritchard, H. D., Aitken, A., Bamber, J. L., ... & Zirizzotti, A.: Antarctic Bedmap data: Findable, Accessible, Interoperable, and Reusable (FAIR) sharing of 60 years of ice bed, surface, and thickness data, Earth Syst. Sci. Data, 15, 2695–2710, https://doi.org/10.5194/essd-15-2695-2023, 2023.

How to cite: Fremand, A., Bodart, J., Jordan, T., Fretwell, P., and Tate, A.: Advancing polar airborne geophysics data management at the UK Polar Data Centre , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5115, https://doi.org/10.5194/egusphere-egu24-5115, 2024.

EGU24-6350 | Posters on site | ESSI2.2

Research and Development within the Scientific Computing Branch of NOAA’s Global Systems Laboratory 

Isidora Jankov, Daniel Abdi, Naureen Bharwani, Emily Carpenter, Christopher Harrop, Chritina Holt, Paul Madden, Timothy Sliwinski, Duane Rosenberg, and Ligia Bernardet

The NOAA Global Systems Laboratory, Earth Prediction Advancement Division, Scientific Computing Branch team works on approaches to facilitate development of cloud-resolving, Earth system prediction systems suitable for the next generation of exascale high performance computing (HPC), including exploration of machine learning (ML) algorithms within our systems for improved performance and reduced computational cost. 

Our work is divided into two main categories: incremental - shorter term and innovative - longer term challenges. Work related to incremental changes focuses on existing NOAA algorithms and improvement of their performance on different architectures (e.g. adapting existing codes to run on GPUs). The more innovative aspects focus on development and evaluation of new algorithms and approaches to environmental modeling that simultaneously improve prediction accuracy, performance, and portability. For this purpose we have developed the GeoFLuid Object Workflow (GeoFLOW), a C++ framework with convective (and other)  dynamics, high order truncation, quantifiable dissipation, an option to use a variety of 2D and 3D grids, and excellent strong scaling and on-node properties. An evaluation of the use of ML-based emulators for different components of the earth system prediction models also forms an important part of our research. 

Finally, a large portion of our research and development activities involves building federated and unified workflows to facilitate both the effective use of distributed computing resources as well as easy configuration for nontrivial workflow applications in research and operations.

A comprehensive summary of these research and development activities will be presented.



How to cite: Jankov, I., Abdi, D., Bharwani, N., Carpenter, E., Harrop, C., Holt, C., Madden, P., Sliwinski, T., Rosenberg, D., and Bernardet, L.: Research and Development within the Scientific Computing Branch of NOAA’s Global Systems Laboratory, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6350, https://doi.org/10.5194/egusphere-egu24-6350, 2024.

EGU24-8711 | ECS | Orals | ESSI2.2

EWoPe: Environmental Workflow Persistence methodology for multi-stage computational processes reproducibility 

Marianna Miola, Daniela Cabiddu, Simone Pittaluga, and Marino Vetuschi Zuccolini

Over the past few decades, geoscientists have progressively exploited and integrated techniques and tools from Applied Mathematics, Statistics, and Computer Science to investigate and simulate natural phenomena. Depending on situation, the sequence of computational steps may vary, leading to intricate workflows that are difficult to reproduce and/or further revisit. To ensure that such workflows can be repeated for validation, peer review, or further investigation, it is necessary to implement strategies of workflow persistence, that is the ability to maintain the continuity, integrity, and reproducibility of a workflow over time.

In this context, we propose an efficient strategy to support workflow persistence of Geoscience pipelines (i.e., Environmental Workflow Persistence methodology EWoPe). Our approach enables to document each workflow step, including details about data sources, processing algorithms, parameters, final and intermediate outputs. Documentation aids in understanding the workflow's methodology, promotes transparency, and ensures replicability. 

Our methodology views workflows as hierarchical tree data structures. In this representation, each node describes data, whether it's input data or the result of a computational step, and each arc is a computational step that uses its respective nodes as inputs to generate output nodes. The relationship between input and output can be described as either one-to-one or one-to-many, allowing the flexibility to support either singular or multiple outcomes from a single input.

The approach ensures the persistence of workflows by employing JSON (JavaScript Object Notation) encoding. JSON is a lightweight data interchange format designed for human readability and ease of both machine parsing and generation. By this persistence workflow management, each node within a workflow consists of two elements. One encodes the raw data itself (in one or multiple files). The other is a JSON file that describes through metadata the computational step responsible for generating the raw data, including the reference to input data and parameters. Such a JSON file serves to trace and certify the source of the data, offering a starting point for retracing the workflow backward to its original input data or to an intermediate result of interest.

Currently, EWoPe methodology has been implemented and integrated into MUSE (Modeling Uncertainty as a Support for Environment) (Miola et al., STAG2022), a computational infrastructure to evaluate spatial uncertainty in multi-scenario applications such as in environmental geochemistry, reservoir geology, or infrastructure engineering. MUSE allows running specific multi-stage workflows that involve spatial discretization algorithms, geostatistics, and stochastic simulations: the usage of EWoPe methodology in MUSE can be seen as an example of its deployment.

By exploiting transparency of input-output relationships and thus ensuring the reproducibility of results, EWoPe methodology offers significant benefits to both scientists and downstream communities involved in utilizing environmental computational frameworks.

Acknowledgements: Funded by the European Union - NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project “RAISE - Robotics and AI for Socio-economic Empowerment” (ECS00000035) and by PON "Ricerca e Innovazione" 2014-2020, Asse IV "Istruzione e ricerca per il recupero", Azione IV.5 "Dottorati su tematiche green" DM 1061/2021.

How to cite: Miola, M., Cabiddu, D., Pittaluga, S., and Vetuschi Zuccolini, M.: EWoPe: Environmental Workflow Persistence methodology for multi-stage computational processes reproducibility, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8711, https://doi.org/10.5194/egusphere-egu24-8711, 2024.

EGU24-9381 | Posters virtual | ESSI2.2

A Multi-level Approach for Provenance Management And Exploration in Climate Workflows 

Fabrizio Antonio, Mattia Rampazzo, Ludovica Sacco, Paola Nassisi, and Sandro Fiore

Provenance and reproducibility are two key requirements for analytics workflows in Open Science contexts. Handling provenance at different levels of granularity and during the entire experiment lifecycle becomes key to properly and flexibly managing lineage information related to large-scale experiments as well as enabling reproducibility scenarios, which in turn foster re-usability, one of the FAIR guiding data principles.

This contribution focuses on a multi-level approach applied to climate analytics experiments as a way to manage provenance information in a more structured and multifaceted way, and navigate and explore the provenance space across multiple dimensions, thus enabling the possibility to get coarse- or fine-grained information according to the actual requested level. Specifically, the yProv multi-level provenance service, a new core component within an Open Science-enabled research data lifecycle, is introduced by highlighting its design, main features and graph-based data model. Moreover, a climate models intercomparison data analysis use case is presented to showcase how to retrieve and visualize fine-grained provenance information, namely micro-provenance, compliant with the W3C PROV specifications.

This work was partially funded by the EU InterTwin project (Grant Agreement 101058386), the EU Climateurope2 project (Grant Agreement 101056933)and partially under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4 - Call for tender No. 1031 of 17/06/2022 of Italian Ministry for University and Research funded by the European Union – NextGenerationEU (proj. nr. CN_00000013).

How to cite: Antonio, F., Rampazzo, M., Sacco, L., Nassisi, P., and Fiore, S.: A Multi-level Approach for Provenance Management And Exploration in Climate Workflows, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9381, https://doi.org/10.5194/egusphere-egu24-9381, 2024.

Computer performance has increased immensely in recent years, but the ability to store data has hardly increased at all. The current version of meteorological reanalysis data ERA5 provided by the European Centre of Medium-Range Weather Forecasts (ECMWF) has increased by a factor of ∼80 compared to its predecessor ERA-Interim. This presents scientists with major challenges, especially if data covering several decades is to be stored on local computer systems. Accordingly, many compression methods have been developed in recent years with which data can be stored either lossless or lossy. Here we test three of these methods two lossy compression methods ZFP and Layer Packing (PCK) and the lossless compressor ZStandard (ZSTD) and investigate how the use of compressed data affects the results of Lagrangian air parcel trajectory calculations with the Lagrangian model for Massive-Parallel Trajectory Calculations (MPTRAC). We analysed 10-day forward trajectories that were globally distributed over the free troposphere and stratosphere. The largest transport deviations were derived when using ZFP with the largest compression. Using a less strong compression we could reduce the transport deviation and still derive a significant compression. Since ZSTD is a lossless compressor, we derive no transport deviations at all when using these compressed files, but do not loose much disk space using this compressor (reduction of ∼20%). The best result concerning compression efficiency and transport deviations is derived with the layer packing method PCK. The data is compressed by about 50%, but transport deviations do not exceed 40 km in the free troposphere and are even lower in the upper troposphere and stratosphere. Thus, our study shows that the PCK compression method would be valuable for application in atmospheric sciences and that with compression of meteorological reanalyses data files we can overcome the challenges of high demand of disk space from these data sets.

How to cite: Khosrawi, F. and Hoffmann, L.: Compression of meteorological reanalysis data files and their application to Lagrangian transport simulations , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10750, https://doi.org/10.5194/egusphere-egu24-10750, 2024.

EGU24-11167 | Posters on site | ESSI2.2

A Dynamic HPC Probabilistic Tsunami Forecast Workflow for Real-time Hazard Assessment 

Louise Cordrie, Jorge Ejarque, Carlos Sánchez-Linares, Jacopo Selva, Jorge Macías, Steven J. Gibbons, Fabrizio Bernardi, Bernardi Tonini, Rosa M Badia, Sonia Scardigno, Stefano Lorito, Fabrizio Romano, Finn Løvholt, Manuela Volpe, Alessandro D'Anca, Marc de la Asunción, and Valentina Magni

The Urgent Tsunami Computing procedures discussed herein are designed to quantify potential hazards resulting from seismically-induced tsunamis following an earthquake, with a temporal scope ranging from minutes to a few hours. The presented workflow employs comprehensive simulations, encompassing the entire tsunami propagation process, while accounting for uncertainties associated with source parameters, tsunamigenesis and wave propagation dynamics. Within the EuroHPC eFlows4HPC project, we present a High-Performance Computing (HPC) workflow tailored for urgent tsunami computation in which the Probabilistic Tsunami Forecast (PTF) code has been restructured and adapted for seamless integration into a PyCOMPSs framework. This framework enables parallel execution of tasks and includes simulations from Tsunami-HySEA numerical model within a unified computational environment. Of particular significance is the workflow's capability to incorporate new datasets, such as focal mechanism data, seismic records, or real-time tsunami observations. This functionality facilitates an "on-the-fly" update of the PTF, ensuring that the forecasting model remains responsive to the latest information. The development of this workflow involves a systematic exploration of diverse scenarios, realistic simulations, and the assimilation of incoming data. The overarching goal is to rigorously diminish uncertainties, thereby producing updated probabilistic forecasts without compromising precision and enhancing risk mitigation efforts far from the seismic source. Improved risk management, achieved by informing decision-making in emergency situations, underscores the importance of this development. We will showcase the technical advancements undertaken to tailor the workflow for HPC environments, spanning from the developers' perspective to that of the end user. Additionally, we will highlight the scientific enhancements implemented to leverage the full potential of HPC capabilities, aiming to significantly reduce result delivery times while concurrently enhancing the accuracy and precision of our forecasts.

How to cite: Cordrie, L., Ejarque, J., Sánchez-Linares, C., Selva, J., Macías, J., Gibbons, S. J., Bernardi, F., Tonini, B., Badia, R. M., Scardigno, S., Lorito, S., Romano, F., Løvholt, F., Volpe, M., D'Anca, A., de la Asunción, M., and Magni, V.: A Dynamic HPC Probabilistic Tsunami Forecast Workflow for Real-time Hazard Assessment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11167, https://doi.org/10.5194/egusphere-egu24-11167, 2024.

EGU24-11774 | Posters virtual | ESSI2.2

An end-to-end workflow for climate data management and analysis integrating HPC, Big Data and Machine Learning  

Alessandro D'Anca, Sonia Scardigno, Jorge Ejarque, Gabriele Accarino, Daniele Peano, Francesco Immorlano, Davide Donno, Enrico Scoccimarro, Rosa M. Badia, and Giovanni Aloisio

The advances in Earth System Models (ESM), jointly with the availability of more powerful computing infrastructures and novel solutions for Big Data and Machine Learning (ML) is allowing to push research in the climate change field forward. In such context, workflows are fundamental tools to automate the complex processes of model simulations, data preparation and analyses. Such tools are becoming more important as the complexity and heterogeneity in the software and computing infrastructures, as well as the data volumes to be handled, grow. However, integrating into a single workflow simulation and data centric processes can be very challenging due to their different requirements.
This work presents an end-to-end workflow including the steps from the numerical ESM simulation run to the analysis of extreme weather events (e.g., heat waves and tropical cyclones) developed in the context of the eFlows4HPC EuroHPC project. It represents a real case study which requires components from High Performance Computing (HPC), Big Data and ML to carry out the workflow. In particular, the contribution demonstrates how the eFlows4HPC software stack can simplify the development, deployment, orchestration and execution of complex end-to-end workflows for climate science, as well as improve their portability over different computing infrastructures.

How to cite: D'Anca, A., Scardigno, S., Ejarque, J., Accarino, G., Peano, D., Immorlano, F., Donno, D., Scoccimarro, E., Badia, R. M., and Aloisio, G.: An end-to-end workflow for climate data management and analysis integrating HPC, Big Data and Machine Learning , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11774, https://doi.org/10.5194/egusphere-egu24-11774, 2024.

EGU24-11859 | ECS | Orals | ESSI2.2

Workflow composition for compound flooding events and adaptation measures 

Willem Tromp, Hessel Winsemius, Dirk Eilander, Albrecht Weerts, and Björn Backeberg

Modelling of compound flood events, the assessment of their impact, and assessing mitigation and adaptation measures is in increasing demand for local authorities and stakeholders to support their decision making. Additionally, the severity of extreme events driving compound flooding, including storms and heavy rainfall, is projected to increase under climate change. To support local communities in flood risk management, complex modelling systems involving multiple cross-disciplinary models need to be orchestrated in order to effectively and efficiently run a wide range of what-if scenarios or historical events to understand the drivers and impacts of compound floods. The large volume and variety of data needed to configure the necessary models and simulate events strain the reproducibility of modelling frameworks, while the number of events and scenarios demand increasingly powerful computing resources. Here we present a solution to these challenges using automated workflows, leveraging the Common Workflow Language standard. The presented workflows update a base model configuration for a user-specified event or scenario, and automatically reruns multiple defined scenarios. The models are executed in containers and dispatched using the StreamFlow workflow manager designed for hybrid computing infrastructures. This solution offers a single, uniform interface for configuring all models involved in the model train, while also offering a single interface for running the model chain locally or on high performance computing infrastructures. The allows researchers to leverage data and computing resources more efficiently and provide them with a larger and more accurate range of compound flood events to support local authorities and stakeholders in their decision making.

How to cite: Tromp, W., Winsemius, H., Eilander, D., Weerts, A., and Backeberg, B.: Workflow composition for compound flooding events and adaptation measures, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11859, https://doi.org/10.5194/egusphere-egu24-11859, 2024.

EGU24-14011 | Orals | ESSI2.2

RRR: Reliability, Replicability, Reproducibility for Climate Models   

Aidan Heerdegen, Harshula Jayasuriya, Tommy Gatti, Varvara Efremova, Kelsey Druken, and Andy Hogg

It is difficult to reliably build climate models, reproduce results and so replicate scientific findings. Modern software engineering coupled with the right tools can make this easier. 

Some sources of complexity that make this a difficult problem:  

  • Climate models are an imperfect translation of extremely complex scientific understanding into computer code. Imperfect because many assumptions are made to make the problems tractable.  
  • Climate models are typically a number of separate models of different realms of the earth system, which run independently while exchanging information at their boundaries.   
  • Building multiple completely separate models and their many dependencies, all with varying standards of software engineering and architecture. 
  • Computational complexity requires high performance computing (HPC) centres, which contain exotic hardware utilising specially tuned software.  

ACCESS-NRI uses spack, a build-from-source package manager that targets HPC, and which gives full build provenance and guaranteed build reproducibility. This makes building climate models easier and reliable. Continuous integration testing of build correctness and reproducibility, model replicability, and scientific reproducibility eliminates a source of complexity and uncertainty. The model is guaranteed to produce the same results from the same code, or modified code, when those changes should not alter answers.  

Scientists can be confident that any variation in their climate model experiments is due to factors under their control, rather than changes in software dependencies, or the tools used to build the model. 

How to cite: Heerdegen, A., Jayasuriya, H., Gatti, T., Efremova, V., Druken, K., and Hogg, A.: RRR: Reliability, Replicability, Reproducibility for Climate Models  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14011, https://doi.org/10.5194/egusphere-egu24-14011, 2024.

EGU24-15382 | ECS | Posters on site | ESSI2.2

Using Docker for reproducible workflows 

Mirko Mälicke, Alexander Dolich, Ashish Manoj Jaseetha, Balazs Bischof, and Lucas Reid

We propose a framework-agnostic specification for contextualizing Docker containers in environmental research. Given a scientific context, containers are especially useful to combine scripts in different languages following different development paradigms. 

The specification standardizes inputs and outputs from and to containers to ease the development of new tools, retrace results and add a provenance context to scientific workflows. As of now we also provide templates for the implementation of new tools developed in Python, R, Octave and NodeJS, two different server applications to run the containers in a local or remote setting and a Python client to seamlessly include containers into existing workflows. A flutter template is in development, which can be used as a basis to build use-case specific applications for Windows, Linux, Mac, the Web, Android and iOS.

We present the specification itself, with a focus on ways of contributing, to align the specification with as many geoscientific use-cases as possible in the future. In addition a few insights into current implementations are given, namely the role of the compliant pre-processing tools in the generation of the CAMELS-DE dataset, as well as result presentation for a Machine learning application for predicting soil moisture. Both applications are presented at EGU as well. We use these examples to demonstrate how the framework can increase the reproducibility of associated workflows.

How to cite: Mälicke, M., Dolich, A., Manoj Jaseetha, A., Bischof, B., and Reid, L.: Using Docker for reproducible workflows, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15382, https://doi.org/10.5194/egusphere-egu24-15382, 2024.

EGU24-15849 | ECS | Posters on site | ESSI2.2

Urgent Computing Integrated Services for Earthquakes  

Rut Blanco-Prieto, Marisol Monterrubio-Velasco, Marta Pienkowska, Jorge Ejarque, Cedric Bhihe, Natalia Zamora, and Josep de la Puente

The Urgent Computing Integrated Services for Earthquakes (UCIS4EQ) introduces a fully automatic seismic workflow centered on rapidly delivering synthetic assessments of the impact of moderate to large earthquakes throughout physics-based forward simulations. This novel approach links High-Performance Computing (HPC), High-Performance Data Analytics (HPDA), and highly optimized numerical solvers. Its core objective lies in performing numerical simulations either during or right after an earthquake, accomplishing this task within a short timeframe, typically spanning from minutes to a few hours.

During multi-node execution, PyCOMPSs orchestrates UCIS4EQ’s distributed tasks and improves its readiness level towards providing an operational service. UCIS4EQ coordinates the execution of multiple seismic sources to account for input and model uncertainties. Its comprehensive scope provides decision-makers with numerical insights into the potential outcomes of post-earthquake emergency scenarios.

The UCIS4EQ workflow includes a fast inference service based on location-specific pre-trained machine learning models. Such learned models permit a swift analysis and estimation of the potential damage caused by an earthquake. Leveraging advanced AI capabilities endows our workflow with the  ability to rapidly estimate a seismic event's impact. Ultimately it provides valuable support for rapid decision-making during emergencies.

Through the integration of high performance computational techniques and pioneering methodologies, our hope is to see UCIS4EQ emerge as a useful instrument to make agile and well-informed post-event decisions in the face of seismic events.

With this study, we account for UCIS4EQ's continuous development through a number of case studies. These case studies will shed light on the most recent developments and applications of urgent computing seismic workflow, demonstrating its efficacy in providing rapid and precise insights into earthquake scenarios.

How to cite: Blanco-Prieto, R., Monterrubio-Velasco, M., Pienkowska, M., Ejarque, J., Bhihe, C., Zamora, N., and de la Puente, J.: Urgent Computing Integrated Services for Earthquakes , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15849, https://doi.org/10.5194/egusphere-egu24-15849, 2024.

EGU24-17007 | ECS | Posters on site | ESSI2.2

Building a useful dataset for ICON output 

Lukas Kluft and Tobias Kölling

Global kilometer-scale climate models produce vast amounts of output, posing challenges in efficient data utilization. For ICON, we addressed this by creating a consolidated and analysis-ready dataset in the Zarr format, departing from the previous cumbersome directory structure. This new dataset format provides a comprehensive overview of variables and time steps at one glance.

To ensure swift and ergonomic access to the dataset, we employ two key concepts: output hierarchies and multidimensional chunking. We remapped all output onto the HEALPix grid, facilitating hierarchical resolutions, and pre-computed temporal aggregations like daily and monthly averages. This enables users to seamlessly switch between resolutions, reducing computational burdens during post-processing.

Spatial chunking of high-resolution data further allows for efficient extraction of regional subsets, significantly improving the efficiency of common climate science analyses, such as time series and vertical cross-sections. While our efforts primarily integrate established strategies, the synergies achieved in resolution have shown a profound impact on the post-processing efficiency of our global kilometer-scale output.

In summary, our approach, creating a single analysis-ready dataset, pre-computing hierarchies, and employing spatial chunking, addresses challenges in managing and extracting meaningful insights from increasingly large model output. We successfully tested the new analysis-ready datasets during well-attended hackathons, revealing significant usability and performance improvements over a wide range of real-life applications.

How to cite: Kluft, L. and Kölling, T.: Building a useful dataset for ICON output, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17007, https://doi.org/10.5194/egusphere-egu24-17007, 2024.

EGU24-17141 | ECS | Orals | ESSI2.2

Building useful datasets for Earth System Model output 

Tobias Kölling and Lukas Kluft

Global kilometer-scale climate models generate vast quantities of simulation output, presenting significant challenges in using this wealth of data effectively. We approach this challenge from the output user perspective, craft useful dataset requirements and show how this also simplifies data handling and reduces the “time to plot”.

At first, this strategy involves the creation of a consolidated and analysis-ready n-dimensional dataset, directly from the running model. A dataset forms a consistent set of data, which as a whole aims to inform a user about what and what not to expect from the output of a given model run. This is notably distinct from multiple independent datasets or messages, which can’t convey the overall structure and are often (slightly, but annoyingly) inconsistent. Thus, the amount of user surprise can be reduced by reducing the number of output datasets of a model run towards 1. This of course requires synthesizing relevant information from diverse model outputs, but in turn streamlines the accessibility and usability of climate simulation data.

With the goal of the user-perspective of a single large dataset established, we need to ensure that access to this dataset is swift and ergonomic. At the kilometer-scale, horizontal grids of global models outgrow the size of computer screens and the capacity of the human eye, changing viable dataset usage patterns: we can either observe a coarser version of the data globally or high resolution data locally, but not both. We make use of two concepts to cope with this fact: output hierarches and multidimensional chunking.

By accumulating data in both temporal and spatial dimensions, while keeping the dataset structure, users can seemlessly switch between resolutions, reducing the computational burden during post-processing at the large scale perspective. In addition, splitting high-resolution data in compact spatiotemporal chunks, regional subsets can be extracted quickly as well. While adding the hierarchy adds a small amount of extra data to already tight disk space quotas, a good chunk design and state-of-the-art compression techniques reduce storage requirements without adding access time overhead. On top, the approach generates an opportunity for hierarchical storage systems: only those regions and resoultions which are actively worked on have to reside in “hot” storage.

In summary, our collaborative efforts bring together diverse existing strategies to revolutionize the output and post-processing landscape of global kilometer-scale climate models. By creating a single analysis-ready dataset, pre-computing hierarchies, employing spatial chunking, and utilizing advanced compression techniques, we aim to address challenges associated with managing and extracting meaningful insights from these vast simulations. This innovative approach enhances the efficiency of many real-life applications, which is a necessity for analysing multi-decadal kilometer-scale model output.

How to cite: Kölling, T. and Kluft, L.: Building useful datasets for Earth System Model output, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17141, https://doi.org/10.5194/egusphere-egu24-17141, 2024.

EGU24-17154 | ECS | Posters on site | ESSI2.2

Adaptive Data Reduction Techniques for Extreme-Scale Atmospheric Models 

Niklas Böing, Johannes Holke, Chiara Hergl, Achim Basermann, and Gregor Gassner

Large-scale earth system simulations produce huge amounts of data. Due to limited I/O bandwidth and available storage space this data often needs to be reduced before writing to disk or storing permanently. Error-bounded lossy compression is an effective approach to tackle the trade-off between accuracy and storage space.

We are exploring and discussing error-bounded lossy compression based on tree-based adaptive mesh refinement (AMR) techniques. According to flexible error-criteria the simulation data is coarsened until a given error bound is reached. This reduces the number of mesh elements and data points significantly.

The error criterion may for example be an absolute or relative point-wise error. Since the compression method is closely linked to the mesh we can additionally incorporate geometry information - for example varying the error by geospatial region.

We implement these techniques as the open source tool cmc, which is based on the parallel AMR library t8code. The compression tool can be linked to and used by arbitrary simulation applications or executed as a post-processing step. As a first example, we couple our compressor with the MESSy and MPTRAC libraries.

We show different results including the compression of ERA5 data. The compressed sample datasets show better results in terms of file size than conventional compressors such as SZ and ZFP. In addition, our method allows for a more fine-grained error control.

How to cite: Böing, N., Holke, J., Hergl, C., Basermann, A., and Gassner, G.: Adaptive Data Reduction Techniques for Extreme-Scale Atmospheric Models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17154, https://doi.org/10.5194/egusphere-egu24-17154, 2024.

EGU24-18057 | ECS | Posters on site | ESSI2.2

Tools and techniques for modular, portable (Machine Learning) parameterisations 

Jack Atkinson, Dominic Orchard, Elliott Kasoar, and Thomas Meltzer

Numerical models across the geophysical sciences make extensive use of physical parameterisations to represent sub-grid processes such as eddies, convection, gravity waves, and microphysics. With the advent of machine learning (ML), big data, and artificial intelligence there are increasing efforts to develop data-driven parameterisation schemes or emulate existing, but computationally-intensive, parameterisations using ML techniques.

Despite their name, the irony is that traditional design approaches often lead to parameterisations which are not themselves parameters, but rather tightly integrated, enmeshed pieces of code in a larger model. Such parameterisations and practices pose a threat to the reproducibility, accuracy, ease-of-(re)-use, and general FAIRness (Findable, Accessible, Interoperable, Reusable) [1] of the schemes being developed.

In contrast, a modular approach to parameterisations (and their receivers, e.g., GCMs), would enable them to be more easily (1) interchangeable and Interoperable, to compare different schemes and assess uncertainty due to their inherent approximate behaviour, and (2) portable and Reusable, between models, to reduce engineering effort, and (3) understandable and Accessible, by being clear about dependencies and the physical meaning of program variables, and (4) testable, to aid verification and correctness testing. 

Achieving this goal in the context of numerical modelling brings a number of scientific, engineering, and computational challenges. In this talk we aim to set out some best-practice principles for achieving modular parameterisation design for geoscience modellers. We then focus on the particular challenges around modem ML parameterisations. 

To this end we have developed a library for easily interoperating ML-based parameterisations in PyTorch with Fortran-based numerical models, called FTorch [2]. By reducing the Fortran-PyTorch Interoperability burden on researchers this framework should reduce errors that arise and increase the speed of development when compared to other approaches such as re-coding models in Fortran. FTorch aims to make emerging ML parameterisation research more Accessible to those who may not have deep expertise of ML, Fortran, and/or computer science. It also means that models developed using PyTorch can leverage its feature-rich libraries and be shared in their native format, maximising Reusability. We discuss the design principles behind FTorch in the context of modular parameterisations and demonstrate our principles and approach by coupling ML parameterisations to atmospheric and climate models.

In general, we present a number of considerations that could be used to make all parameterisation schemes more easily Interoperable and Re-useable by their developers.

 

[1] Barker, M. et al, Introducing the FAIR Principles for research software, Sci Data 9, 622 (2022) https://doi.org/10.1038/s41597-022-01710-x

[2] FTorch https://github.com/Cambridge-ICCS/FTorch

How to cite: Atkinson, J., Orchard, D., Kasoar, E., and Meltzer, T.: Tools and techniques for modular, portable (Machine Learning) parameterisations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18057, https://doi.org/10.5194/egusphere-egu24-18057, 2024.

EGU24-18724 | ECS | Orals | ESSI2.2

Reproducible Workflows and Compute Environments for Reusable Datasets, Simulations and Research Software 

Alan Correa, Anil Yildiz, and Julia Kowalski

The pursuit of reproducibility in research has long been emphasized. It is even more critical in geohazards research and practice, where model-based decision-making needs to be transparent for trustworthy applications. However, enabling reproducibility in process-based or machine learning workflows requires time, energy, and sometimes manual operations or even unavailable resources. Moreover, the diversity in modern compute environments, both in hardware and software, significantly hinders the path to reproducibility. While many researchers focus on reproducibility, we advocate that reusability holds greater value and inherently requires the former. Reusable datasets and simulations can allow for transparent and reliable decision support, analysis as well as benchmarking studies. Reusable research software can foster composition and faster development of complex projects, while avoiding the reinvention of complicated data structures and algorithms.

Establishing reproducible workflows and compute environments is vital to enable and ensure reusability. Prioritising reproducible workflows is crucial for individual use, while both reproducible compute environments and workflows are essential for broader accessibility and reuse by others. We present herein various challenges faced in coming up with reproducible workflows and compute environments along with solution strategies and recommendations through experiences from two projects in geohazards research. We discuss an object-oriented approach to simulation workflows, automated metadata extraction and data upload, unique identification of datasets (assets) and simulation workflows (processes) through cryptographic hashes. We investigate essential factors, such as software versioning and dependency management, reproducibility across diverse hardware used by researchers, and time to first reproduction/reuse (TTFR), to establish reproducible computational environments. Finally, we shall explore the landscape of reproducibility in compute environments, covering language-agnostic package managers, containers, and language-specific package managers supporting binary dependencies.

How to cite: Correa, A., Yildiz, A., and Kowalski, J.: Reproducible Workflows and Compute Environments for Reusable Datasets, Simulations and Research Software, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18724, https://doi.org/10.5194/egusphere-egu24-18724, 2024.

EGU24-19437 | Posters virtual | ESSI2.2

A fully automated reproducible self-documenting workflow implemented in our inverse regional carbon model 

Arndt Meier, Guillaume Monteil, Ute Karstens, Marko Scholze, and Alex Vermeulen

The Integrated Carbon Observation System (ICOS) and the Lund University department of Physical Geography and Ecosystem Science have developed the Lund University Modular Inversion Algorithm (LUMIA) that is being used for the inverse modelling of carbon and isotope resolved methane. The work is linked with past and present Horizon 2020 projects like DICE and AVENGERS with the overarching goal of supporting the Paris agreement goals.

The ICOS Carbon Portal (https://data.icos-cp.eu/portal) collects, maintains and supports a large range of greenhouse gas observations as well as some inventory and model data all of which have a unique persistent identifier each, which is a key pre-requisite for achieving a reproducible work-flow.

Here we present a self-documenting fully reproducible work-flow for our inverse carbon model LUMIA, that is based on frameworks discussed in EU initiatives like Copernicus CAMS and CoCO2 as well as our own experiences from actual work-flows routinely used in Australian court cases against illegal land use changes. We will show a live demonstration of the system including its graphical user interfaces and the created provenance and reproducibility meta-data.

How to cite: Meier, A., Monteil, G., Karstens, U., Scholze, M., and Vermeulen, A.: A fully automated reproducible self-documenting workflow implemented in our inverse regional carbon model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19437, https://doi.org/10.5194/egusphere-egu24-19437, 2024.

EGU24-19486 | Orals | ESSI2.2 | Highlight

FAIR Workflows and Methods for Scaling 

Sean R. Wilkinson

The FAIR Principles, originally introduced as guiding principles for scientific data management and stewardship, also apply abstractly to other digital objects such as research software and scientific workflows. When introduced to the FAIR principles, most scientists can see that the concepts behind the FAIR principles — namely, to make digital objects Findable, Accessible, Interoperable, and Reusable — will improve the quality of research artifacts. It is less common, however, that scientists immediately recognize the ways in which incorporating FAIR methods into their research can enable them to tackle problems of greater size and complexity. In short, focusing on making artifacts that are reusable in the FAIR sense makes those artifacts reusable by humans as well as machines, thus enabling computational workflows that handle scaling issues automatically and someday even self-assemble. Here, we will discuss recent community developments in FAIR computational workflows and how they can impact the earth sciences now and in the future.

How to cite: Wilkinson, S. R.: FAIR Workflows and Methods for Scaling, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19486, https://doi.org/10.5194/egusphere-egu24-19486, 2024.

EGU24-20358 | Orals | ESSI2.2

ESM-Tools - A modular infrastructure for Earth System Modelling 

Miguel Andrés-Martínez, Nadine Wieters, Paul Gierz, Jan Streffing, Sebastian Wahl, Joakim Kjellsson, and Bernadette Fritzsch

In the last decades, the operation, maintenance and administration of Earth System Models (ESMs) have become substantially more complex due to the increasing number of available models, coupling approaches and versions, and the need of tuning for different scales and configurations. Another factor contributing to the complexity of operation is the requirement to run the models on different High Performance Computing (HPC) platforms. In this context, configuration tools, workflow managers and ESM-oriented scripting tools have become essential for administrating, distributing and operating ESMs, across research groups, institutions and members of international projects, while still ensuring simulation reproducibility.

ESM-Tools is an open-source software infrastructure and configuration tool that tackles these challenges associated with the operation of ESMs. ESM-Tools enables seamlessly building and running ESMs across different HPCs in a reproducible manner. Most importantly, it is used by model developers to distribute standard simulation configurations, so that the user can effortlessly run these predefined simulations while retaining the flexibility to modify only the parameters that align with their specific needs. This lowers the technical threshold for new model users and makes the ESMs more accessible.

The source-code consists of an HPC- and model-agnostic Python back-end, and a set of model- and HPC-specific configuration YAML files. In this way, adding a new model, coupled model or HPC is just a matter of writing new configuration YAML files. The configuration files are highly modularized which allows for their reutilization in new setups (e.g. new components are added, while some existing component configurations are reused). Configuration conflicts between the different files are resolved hierarchically accordingly to their configuration category, giving priority to model- and simulation-specific configurations. ESM-Tools also provides basic workflow-management capabilities which allow for plugging in preprocessing and postprocessing tasks and running offline coupled models. The tasks of the ESM-Tools workflow can be reorganized, new tasks can be included, and single tasks can be executed independently, allowing for its integration in more advance workflow manager software if required.

Among other coupled Earth System Models, ESM-Tools is currently used to manage and distribute the OpenIFS-based Climate Models AWI-CM3 (FESOM2 + OpenIFS, developed at AWI) and FOCI-OpenIFS (NEMO4 + OpenIFS43r3, developed at GEOMAR, running ORCA05 and ORCA12 in coupled mode with OASIS3-MCT5.0), as well as the AWI-ESM family of models (ECHAM6 + FESOM2). HPCs supported include those of the DKRZ (Hamburg, Germany), Jülich Supercomputing Center (Jülich, Germany), HLRN (Berlin and Göttingen, Germany), and the IBS Center for Climate Physics (Busan, South Korea), with plans to support LUMI (Kajaani, Finland) and desktop distributions (for educational purposes).

In this contribution we will introduce ESM-Tools and the design choices behind ESM-Tools architecture. Additionally, we will discuss the advantages of such a modular system, and address the challenges associated with its usability and maintainability resulting from these design choices and our mitigation strategies.

How to cite: Andrés-Martínez, M., Wieters, N., Gierz, P., Streffing, J., Wahl, S., Kjellsson, J., and Fritzsch, B.: ESM-Tools - A modular infrastructure for Earth System Modelling, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20358, https://doi.org/10.5194/egusphere-egu24-20358, 2024.

EGU24-20519 | Posters on site | ESSI2.2

A pragmatic approach to complex citations, closing the provenance gap between IPCC AR6 figures and CMIP6 simulations 

Charlotte Pascoe, Martina Stockhause, Graham Parton, Ellie Fisher, Molly MacRae, Beate Kreuss, and Lina Sitz

Many of the figures in the WGI contribution to the IPCC Sixth Assessment report (AR6) are derived from the data of multiple CMIP6 simulations.  For instance, a plot showing projections of global temperature change in Figure 2 of Chapter 4 of the IPCC AR6 is based on data from 183 CMIP6 simulation datasets. The figure helpfully tells us which CMIP6 experiments were used as input data but does not provide information about the models that ran the simulations. It is possible to deduce the specific input data from supplementary tables in the IPCC assessment report and from within the report’s annexes.  However, these information sources are not machine-accessible so are difficult to use for tracing purposes, and they are not sufficient to give credit as they do not enter indexing services, and they are difficult to find as they are not part of the printed report. Even if we gather this knowledge to create a navigable provenance network for the figure, we are still left with the unwieldy prospect of rendering 183 data citations for an outwardly simple plot.

We require a compact way to provide traceable provenance for large input data networks that makes transparent the specific input data used to create the CMIP6-based figures in IPCC AR6 and gives credit to modelling centres for the effort of running the simulations. The so-called complex citation discussed within the RDA Complex Citation Working Group. 

We present a pragmatic solution to the complex citation challenge that uses an existing public infrastructure technology, Zenodo.  The work establishes traceability by collating references to a figure’s input datasets within a Zenodo record and credit via Zenodo’s relatedWorks feature/DataCite’s relations which link to existing data objects through Persistent Identifiers (PIDs), in this case the CMIP6 data citations.   Whilst a range of PIDs exist to support connection between objects, the use of DOIs is widely used for citations and is well connected within the wider PID graph landscape and Zenodo provides a tool to create objects that utilise the DOI schema provided by DataCite.  CMIP6 data citations have sufficient granularity to assign credit, but the granularity is not fine enough for traceability purposes, therefore Zenodo reference handle groups are used to identify specific input datasets and Zenodo connected objects provide the join between them.

There is still work to be done to establish full visibility of credit referenced within the Zenodo records.  However, we hope to engage the community by presenting our pragmatic solution to the complex citation challenge, one that has the potential to provide modelling centres with a route to a more complete picture of the impact of their simulations.

How to cite: Pascoe, C., Stockhause, M., Parton, G., Fisher, E., MacRae, M., Kreuss, B., and Sitz, L.: A pragmatic approach to complex citations, closing the provenance gap between IPCC AR6 figures and CMIP6 simulations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20519, https://doi.org/10.5194/egusphere-egu24-20519, 2024.

EGU24-20642 | Orals | ESSI2.2

Workflow Provenance with RO-Crate in Autosubmit 

Bruno De Paula Kinoshita, Daniel Beltran Mora, Manuel G. Marciani, and Luiggi Tenorio Ku

In this talk we present current work in Autosubmit to track workflow provenance using the community maintained open standard RO-Crate. Autosubmit is an experiment and workflow manager designed to conduct climate experiments in different platforms (local, HPC, cloud), and is part of different Earth Digital Twin initiatives (Destination Earth Climate Digital Twin, and the European Digital Twin of the Ocean).

Workflow managers have a central role in receiving user input, processing it with local and remote jobs that run on different platforms and that generate output data. RO-Crate enables tracking of workflow prospective (what should happen, e.g. workflow configuration, Slurm job settings) and retrospective (what happened, e.g. log files, performance indicators) provenance. By adopting an open standard that is used by other workflow managers (e.g. Galaxy, COMPSs, Streamflow, WfExS, Sapporo, and Autosubmit) and tools (e.g. Workflow Hub, runcrate) from various domains we show that it not only improves data provenance in Autosubmit, but also interoperability with other workflow managers and tools.

We also describe recent work to integrate RO-Crate with METACLIP, a language-independent framework for climate product provenance that was used in IPCC Atlas. METACLIP uses ontologies such as PROV to track the provenance of climate products. We describe how that relates to RO-Crate, and how we are integrating both via JSON-LD.

How to cite: De Paula Kinoshita, B., Beltran Mora, D., G. Marciani, M., and Tenorio Ku, L.: Workflow Provenance with RO-Crate in Autosubmit, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20642, https://doi.org/10.5194/egusphere-egu24-20642, 2024.

EGU24-20844 | Posters on site | ESSI2.2

Workflows with Autosubmit in EDITO 

Bruno De Paula Kinoshita, Daniel Beltran Mora, Manuel G. Marciani, Luiggi Tenorio Ku, and Miguel Castrillo

The European Digital Twin of the Ocean, EDITO, is an initiative of the European Commission that aims to create a virtual representation of marine and coastal environments around the globe to assess future impacts of climate change and human activities, improving the accessibility of ocean knowledge. EDITO Infra is the backbone of EDITO. It provides the infrastructure where components of the EDITO digital twin are combined and integrated.

In this work, we describe how Autosubmit is integrated and used in EDITO Infra, as the back-end component of the Virtual Ocean Model Lab (VOML, a virtual co-working environment). Users of the digital twin can connect to the VOML to customize and build ocean models, run using cloud and HPC resources, and access applications deployed in the EDITO Infra that consume the output of the models. Besides usually being a local software that helps users to leverage remote resources, in this case we demonstrate Autosubmit's versatility with this scenario in which it is deployed using Docker and Kubernetes, tools used traditionally in cloud environments.

In this context, Autosubmit acts as a middleware between the applications, and HPC and Cloud resources. It manages experiments and workflows, connecting EuroHPC’s like MareNostrum 5 and Leonardo. It provides integration with different GUI’s (with a REST API), GIS systems, and other services.

How to cite: De Paula Kinoshita, B., Beltran Mora, D., G. Marciani, M., Tenorio Ku, L., and Castrillo, M.: Workflows with Autosubmit in EDITO, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20844, https://doi.org/10.5194/egusphere-egu24-20844, 2024.

EGU24-21138 | Posters on site | ESSI2.2

Digital Twining of Geophysical Extremes 

Ramon Carbonell, Arnau Folch, Antonio Costa, Beata Orlecka-Sikora, Piero Lanucara, Finn Løvholt, Jorge Macías, Sascha Brune, Alice-Agnes Gabriel, Sara Barsotti, Joern Behrens, Jorge Gomez, Jean Schmittbuhl, Carmela Freda, Joanna Kocot, Domenico Giardini, Michael Afanasiev, Helen Glaves, and Rosa Badía

The geophysical research community has developed a relatively large amount of numerical codes and scientific methodologies which are able to numerically simulate through physics the extreme behavior of the Earth systems (for example: volcanoes, tsunamis earthquakes, etc). Furthermore,
nowadays, large volumes of data have been acquired and, even near real-time data streams are accessible. Therefore, Earth scientist currently have on their hands the possibility of monitoring these events through sophisticated approaches using the current leading edge computational capabilities provided by pre-exascale computing infrastructures. The implementation and deployments of 12 Digital Twin Components (DTCs), addressing different aspects of geophysical extreme events is being carried out by DT-GEO, a project funded under the Horizon Europe programme (2022-2025). Each DTC is intended as self-contained entity embedding flagship simulation codes, Artificial Intelligence layers, large volumes of (real-time) data streams from and into data-lakes, data assimilation methodologies, and overarching workflows which will are executed independently or coupled DTCs in a centralized HPC and/or virtual cloud computing research infrastructure.

How to cite: Carbonell, R., Folch, A., Costa, A., Orlecka-Sikora, B., Lanucara, P., Løvholt, F., Macías, J., Brune, S., Gabriel, A.-A., Barsotti, S., Behrens, J., Gomez, J., Schmittbuhl, J., Freda, C., Kocot, J., Giardini, D., Afanasiev, M., Glaves, H., and Badía, R.: Digital Twining of Geophysical Extremes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21138, https://doi.org/10.5194/egusphere-egu24-21138, 2024.

EGU24-21636 | Posters virtual | ESSI2.2

Advancing Computational Earth Sciences:Innovations and Challenges in Scientific HPC Workflows 

Rafael Ferreira da Silva, Ketan Maheshwari, Tyler Skluzacek, Renan Souza, and Sean Wilkinson

The advancement of science is increasingly intertwined with complex computational processes [1]. Scientific workflows are at the heart of this evolution, acting as essential orchestrators for a vast range of experiments. Specifically, these workflows are central to the field of computational Earth Sciences, where they orchestrate a diverse range of activities, from cloud-based data preprocessing pipelines in environmental modeling to intricate multi-facility instrument-to-edge-to-HPC computational frameworks for seismic data analysis and geophysical simulations [2].

The emergence of continuum and cross-facility workflows marks a significant evolution in computational sciences [3]. Continuum workflows represent continuous computing access required for analysis pipelines, while cross-facility workflows extend across multiple sites, integrating experiments and
computing facilities. These cross-facility workflows, crucial for real-time applications, offer resiliency and stand as solutions for the demands of continuum workflows. Addressing continuum and cross-facility computing requires a focus on data, ensuring workflow systems are equipped to handle diverse data
representations and storage systems.

As we navigate the computing continuum, the pressing needs of contemporary scientific applications in Earth Sciences call for a dual approach: the recalibration of existing systems and the innovation of new workflow functionalities. This recalibration involves optimizing data-intensive operations and
incorporating advanced algorithms for spatial data analysis, while innovation may entail the integration of machine learning techniques for predictive modeling and real-time data processing in earth sciences. We offer a comprehensive overview of cutting-edge advancements in this dynamic realm, with a focus on computational Earth Sciences, including managing the increasing volume and complexity of geospatial data, ensuring the reproducibility of large-scale simulations, and adapting workflows to leverage emerging computational architectures.

 

[1] Ferreira da Silva, R., Casanova, H., Chard, K., Altintas, I., Badia, R. M., Balis, B., Coleman, T., Coppens, F., Di Natale, F., Enders, B., Fahringer, T., Filgueira, R., et al. (2021). A Community Roadmap for Scientific Workflows Research and Development. 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), 81–90. DOI: 10.1109/WORKS54523.2021.00016


[2] Badia Sala, R. M., Ayguadé Parra, E., & Labarta Mancho, J. J. (2017). Workflows for science: A challenge when facing the convergence of HPC and big data. Supercomputing frontiers and innovations, 4(1), 27-47. DOI: 10.14529/jsfi170102


[3] Antypas, K. B., Bard, D. J., Blaschke, J. P., Canon, R. S., Enders, B., Shankar, M. A., ... & Wilkinson, S. R. (2021, December). Enabling discovery data science through cross-facility workflows. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 3671-3680). IEEE. DOI: 10.1109/BigData52589.2021.9671421

How to cite: Ferreira da Silva, R., Maheshwari, K., Skluzacek, T., Souza, R., and Wilkinson, S.: Advancing Computational Earth Sciences:Innovations and Challenges in Scientific HPC Workflows, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21636, https://doi.org/10.5194/egusphere-egu24-21636, 2024.

EGU24-1971 | Posters on site | ESSI2.5

Case studies towards cryosphere digital twin applications 

Iiris Liinamaa, Tero Aalto, Jari Haapala, Xinfang Zhang, Till Rasmussen, Keguang Wang, Lars Arneborg, Linn Carlstedt, Rivo Uiboupin, and Ilja Maljutenko

The NOrdic CryOSphere Digital Twin (NOCOS DT) project aims to explore and pilot digital twin technology opportunities and showcase how output from key initiatives such as the Destination Earth (DestinE) Climate Change Adaptation Digital Twin (Climate DT) could be leveraged for key sea-ice impact sectors in the Arctic and Baltic.

By enabling simulations at an unprecedented scale and resolution, DestinE and Climate DT aim to provide a more detailed representation of the Earth system. While awaiting the Climate DT data, NOCOS DT has done preparatory work using existing data sets in order to calculate novel navigational risk indicators, user-relevant  sea-ice climatologies and modeling of sea ice breakup and ridges. Additionally, the project looks at the potential use of this new information system and other climate data in marine spatial planning. NOCOS DT provides a Nordic perspective and insight that can inform Destination Earth in its future phases.

The presentation provides an overview of the project and its main outcomes with informative animations. It hopes to spark open conversation around further ways to benefit from emerging digital twin technologies and ways to leverage international initiatives such as Destination Earth for specific impact sector needs in the Baltic Sea area.

CSC – IT Center for Sciences coordinates the NOCOS DT consortium which brings together the Danish Meteorological Institute (DMI), the Finnish Meteorological Institute (FMI), the Norwegian Meteorological Institute (MetNo), the Swedish Meteorological and Hydrological Institute (SMHI) together with Tallinn University of Technology, Department for Marine Systems (TalTech). The project is funded by the Nordic Council of Ministers.

How to cite: Liinamaa, I., Aalto, T., Haapala, J., Zhang, X., Rasmussen, T., Wang, K., Arneborg, L., Carlstedt, L., Uiboupin, R., and Maljutenko, I.: Case studies towards cryosphere digital twin applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1971, https://doi.org/10.5194/egusphere-egu24-1971, 2024.

EGU24-2164 | ECS | Posters on site | ESSI2.5

The data streaming in the Climate Adaptation Digital Twin: a fundamental piece to transform climate data into climate information 

Francesc Roura-Adserias, Aina Gaya i Avila, Leo Arriola i Mikele, Miguel Andrés-Martínez, Dani Beltran Mora, Iker Gonzalez Yeregui, Katherine Grayson, Bruno De Paula Kinoshita, Rohan Ahmed, Aleksander Lacima-Nadolnik, and Miguel Castrillo

In the context of advancing towards high resolution climate projections (1km, sub-hourly) and the consequently large memory requirements, we are reaching the point that not all of the data produced can be stored. In this work, we present the technical infrastructure developed in the context of the Destination Earth ClimateDT project, in order to consume the data produced by the core engines as soon as it is available,  a method known as “data streaming”. This mechanism consists of three main steps that are included in an integrated workflow: the run of the climate models themselves , the applications (which convert the model output to actionable information) and the mechanism that links both sides. This solution is designed to be scalable; different applications can be run simultaneously and with as many different variables and statistics as needed,  in order to fully utilize the output  from the digital twin. The flexibility of the workflow allows different applications to run at their optimal frequency in a seamless way. Last but not least,  the workflow integrates statistical streaming   algorithms, allowing integrated applications to generate on-demand online statistics from streamed data, minimizing the memory footprint. 

How to cite: Roura-Adserias, F., Gaya i Avila, A., Arriola i Mikele, L., Andrés-Martínez, M., Beltran Mora, D., Gonzalez Yeregui, I., Grayson, K., De Paula Kinoshita, B., Ahmed, R., Lacima-Nadolnik, A., and Castrillo, M.: The data streaming in the Climate Adaptation Digital Twin: a fundamental piece to transform climate data into climate information, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2164, https://doi.org/10.5194/egusphere-egu24-2164, 2024.

EGU24-2314 | Orals | ESSI2.5

EcoPro: A Digital Twin for Ecological Projections and Environmental Assessment Using Earth Systems Models and Observations 

Seungwon Lee, Peter Kalmus, Antonio Ferraz, Alex Goodman, Kyle Pearson, Gary Doran, Flynn Platt, Beichen Hu, Ayesha Katugoda, Sudip Chakraborty, Emily Kang, Jia Zhang, Sierra Dahiyat, and Kyle Cavanaugh

EcoPro serves as a digital twin designed for ecological projections, aiming to forecast how ecosystems might change in response to environmental factors. It acts as a platform facilitating access to Earth Systems Model (ESM) outputs and Earth Systems observation datasets. EcoPro enables the development of an ecological model linking ecosystem predictors with environmental drivers. It further allows the downscaling of ESM model outputs to a resolution relevant to the specific ecosystem. The platform then applies the ecological model to these downscaled ESM model outputs. The results of the ecological projection are visualized at a high resolution, providing valuable information for application users and decision-makers. Additionally, EcoPro assesses the performance of new observing systems tailored for ecological projection. This discussion outlines the design and implementation of EcoPro, along with the scientific use cases studied using the digital twin.

How to cite: Lee, S., Kalmus, P., Ferraz, A., Goodman, A., Pearson, K., Doran, G., Platt, F., Hu, B., Katugoda, A., Chakraborty, S., Kang, E., Zhang, J., Dahiyat, S., and Cavanaugh, K.: EcoPro: A Digital Twin for Ecological Projections and Environmental Assessment Using Earth Systems Models and Observations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2314, https://doi.org/10.5194/egusphere-egu24-2314, 2024.

EGU24-2363 | Orals | ESSI2.5

Bespoke simulation capabilities - Digital Twins for Weather-induced Extremes and Climate 

Nils Wedi, Irina Sandu, Joern Hoffmann, Thomas Geenen, and Daniel Thiemert

The objective of DestinE is to create a highly accurate replica or Digital Twin of the Earth. This constitutes a new type of information system, with unprecedented levels of detail, quality, and interactivity to support EU-policy makers and users who implement these policies to better respond and adapt to the challenges posed by environmental change.

Several thematic digital twins of the Earth-system are developed over the course of different phases of DestinE with the first implementations focussing on Extreme Weather and Climate adaptation. DestinE’s digital twins exploit the latest advances in digital technology, science, artificial intelligence, and the huge opportunities offered by the world-leading supercomputing capacities of the European High Performance Computing Joint Undertaking (EuroHPC JU). By combining cutting-edge Earth-system physical and data-driven models and observations DestinE’s digital twins offer bespoke simulation capabilities that accurately simulate natural and human activity and allow to test scenarios that would enable more sustainable development and support European environmental policies. On-demand simulations and a comprehensive distributed data and compute infrastructure tailored to the Big Data needs of Destination Earth for tackling climate and environment-related challenges is located closely to maximise the EuroHPC facilities offered within the strategic allocation in support of the DestinE service.

The European Centre for Medium Range Weather Forecast (ECMWF), the European Space Agency (ESA) and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) are the three organisations entrusted by the EU (DG-CNECT) to achieve this unprecedented endeavour for climate, weather and computing sciences. The work involves scientists, computer analysts and domain experts across many contributing institutions and European countries working together on this common goal.

How to cite: Wedi, N., Sandu, I., Hoffmann, J., Geenen, T., and Thiemert, D.: Bespoke simulation capabilities - Digital Twins for Weather-induced Extremes and Climate, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2363, https://doi.org/10.5194/egusphere-egu24-2363, 2024.

EGU24-3361 | Orals | ESSI2.5

Digital Twin of the North Sea and modular approach for reuse in other applications 

Tjerk Krijger, Peter Thijsse, Dick Schaap, and Jasper van den Barg

Effective use of data driven intelligence for planning and decision making requires solutions that enable stakeholders to better understand the type of information that the data systems provide. In many cases stakeholders have limited expertise on a specific technical subject, but still need to understand and interpret the data driven intelligence to be able to act on limitations, consequences and alternatives.

Three-dimensional data and environment visualization in a virtual web environment can be such an innovation that helps to interpret data by emerging the user in a virtual world where the data is visualized realistically. And by being web-based, accessible via a normal web browser, a large audience can be targeted. An example of such an environment is a planned offshore wind farm, where the user is able to move and look around freely to examine energy yields, effects on fisheries, shipping industries and ecology.

At MARIS we have developed, with Deltares and for Rijkswaterstaat, a 3D digital twin of a wind farm in the North Sea using the game-engine Unity for the web. This 3D digital twin assists marine spatial planning in the surrounding area and allows the user to freely move around in the Prinses Amalia wind farm that is made to scale. It includes real time information on the water, waves and air temperature, the wind speed and direction and the energy yields of the wind turbines, and the application on screen responds “live” to this in visualised wave height, direction, turbine speed, graphs, etc. The wind farm contains multiple 3D assets for the turbines, local fish, vessels and more that the user can highlight for more information. The application runs smoothly in the browser on regular computers making it accessible to as many people as possible. In the developments of this 3D environment, a modular approach is applied, such that parts of the application, like the combination of mean sea level and waves, can be reused in other applications.

Digital twins have been a hot topic over the last years and will be for next years, with a focus mostly on the models and data behind them. The visualisation and user interfacing is still largely understated. With our developments we want to show what kind of 3D visualizations can be achieved using the blueprint environment in Unity. Because of the modular approach, the visualizations can be extended to include other available types of data and models and serve other types of use cases and applications, such as for marine protected areas - our next pilot in development under the Horizon Europe EFFECTIVE project.

How to cite: Krijger, T., Thijsse, P., Schaap, D., and van den Barg, J.: Digital Twin of the North Sea and modular approach for reuse in other applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3361, https://doi.org/10.5194/egusphere-egu24-3361, 2024.

EGU24-7534 | Posters on site | ESSI2.5

Compound flood forecasting and climate adaptation Destination Earth digital twin 

Kun Yan, Albrecht Weerts, Tycho Bovenschen, Andrew Cunningham, Sanne Muis, Floortje Roelvink, Frederiek Sperna Weiland, and Tammo Zijlker

Coastal deltas are extremely susceptible to flooding from sea, rivers, heavy rain and even more severe combinations thereof. Many coastal deltas are densely populated, and flood risk forms a serious threat that will likely increase in the future. There are two main mechanisms to reduce the devastating impacts of these floods; (1) adaptation to the increasing climate risks and (2) improved early warning and emergency response. We will present the Destination Earth digital twin on coastal compound flood inundation forecasting and climate adaptation that provide information  to support reducing of impacts.

Examples for 5 use cases are presented showing flood inundation and flood impact maps resulting from compound combinations of surge, waves, heavy rain and riverine flooding. The results are based on an automated complex but generic workflow that takes the high resolution meteorological forcing from Extremes DT or Climate DT as input to the high resolution hydrological model (Wflow_sbm), the hydrodynamic model (Delft3D-FM) and the wave models (hurrywave and snap wave). Those models provide the boundary condition of the 2D flood inundation model SFINCS. Based on the calculated flood maps, impacts (in flood damage and people affected) are being calculated with Delft-FIAT. Results of the modelling chain and model validation will be to end user. The insights we obtained from our end users will provide valuable inputs for design of the compound flood digital twin.

How to cite: Yan, K., Weerts, A., Bovenschen, T., Cunningham, A., Muis, S., Roelvink, F., Sperna Weiland, F., and Zijlker, T.: Compound flood forecasting and climate adaptation Destination Earth digital twin, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7534, https://doi.org/10.5194/egusphere-egu24-7534, 2024.

EGU24-8062 | Posters on site | ESSI2.5

One-pass algorithms for streamed climate data 

Katherine Grayson, Aleks Lacima-Nadolnik, Francesc Roura Adserias, Ehsan Sharifi, Stephan Thober, and Francisco Doblas-Reyes

Projections from global climate models (GCMs) are regularly used to create information for climate adaptation policies and socio-economic decisions. As demand grows for accuracy in these projections, GCMs are being run at increasingly finer spatiotemporal resolution to better resolve physical processes and consequently reduce uncertainty associated with parametrizations (Iles et al., 2020;  Palmer, 2014). Yet this increase in resolution and the consequent size of the data output makes the current state-of-the-art archives (e.g., CORDEX, CMIP) unfeasible. Moreover, the current archival method has left some data consumers without their required data due to the limited number of variables stored and their lower frequency (e.g., monthly means). Initiatives like Destination Earth are investigating the novel method of data streaming, where user applications can be run as soon as the required data is produced by the climate models. Data streaming allows users to access the climate data at the highest frequency possible (e.g., hourly) and native resolution in near real model run-time. This provides an unprecedented time-scale reduction to access the climate data compared with the current simulation paradigm and the possibility of using variables and frequencies not previously available.

Yet the advent of data streaming in the climate community poses its own set of challenges. Often users require climate data that spans long periods. For example, many hydrological impact models require daily, monthly or annual maximum precipitation values (Teutschbein and Seibert, 2012), while in the wind energy sector, accurate distributions of the wind speed over long periods are essential (Lledo, 2019). Obtaining statistics for periods longer than the time the climate model output is accessible can no longer be done using traditional statistical algorithms. This introduces the one-pass problem; how to compute summaries, diagnostics or derived quantities that only see each data point once (i.e., pass through the data one time)?

We present here a detailed analysis on the use of one-pass algorithms to compute statistics on streamed climate data. Unlike traditional two-pass methods, one-pass algorithms do not have access to the full time series of data needed to estimate the statistic; instead, they process data incrementally every time that the model outputs new time steps. While these algorithms have been adopted in other fields such as online trading and machine learning, they have yet to find a foothold in climate science, mainly because they have not been necessary until now. Here we show how one-pass algorithms can be harnessed for use in Earth system digital twins, generating the statistics required by users with minimal loss in accuracy and bypassing unfeasible storage requirements.

Iles, C.E., Vautard, R., Strachan, J., Joussaume, S., Eggen, B.R., Hewitt, C.D., 2020. The benefits of increasing resolution in global and regional climate simulations for European climate extremes. Geoscientific Model Development 13.

Lledo, L., et al. 2019. Seasonal forecasts of wind power generation. Renewable Energy 143.

Palmer, T., 2014. Climate forecasting: Build high-resolution global climate models. Nature 515.

Teutschbein, C., Seibert, J., 2012. Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods. Journal of Hydrology 456-457.

How to cite: Grayson, K., Lacima-Nadolnik, A., Roura Adserias, F., Sharifi, E., Thober, S., and Doblas-Reyes, F.: One-pass algorithms for streamed climate data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8062, https://doi.org/10.5194/egusphere-egu24-8062, 2024.

EGU24-8325 | Orals | ESSI2.5

The interTwin project Digital Twin Engine and the alignment with Destination Earth 

Andrea Manzi, Raul Bardaji Benach, Ivan Rodero Castro, Adam Warde, and Thomas Geenen

The Horizon Europe interTwin project is developing a highly generic yet powerful Digital Twin Engine (DTE) to support interdisciplinary Digital Twins (DT). Comprising thirty-one high-profile scientific partner institutions, the project brings together infrastructure providers, technology providers, and DT use cases from Climate Research and Environmental Monitoring, High Energy and AstroParticle Physics, and Radio Astronomy. This group of experts enables the co-design of the DTE Blueprint Architecture and the prototype platform;benefiting end users like scientists and policymakers but also DT developers. It achieves this by significantly simplifying the process of creating and managing complex Digital Twins workflows.

As part of our contribution, we'll share the latest updates on our project, including the DTE Blueprint Architecture, whose second version is being released in January 2024.  The interTwin Blueprint components, thanks to the collaboration with ECMWF partner in the project, are designed to be aligned with what Destination Earth is designing and building. Therefore, we will show the activities carried out by the project to analyse DestinE architecture and the points of interoperability foreseen. 

The contribution will also cover the diverse DT use cases we currently support and describe the first software release planned for February 2024. 

InterTwin is funded by the European Union (Horizon Europe) under grant agreement No 101058386.

How to cite: Manzi, A., Bardaji Benach, R., Rodero Castro, I., Warde, A., and Geenen, T.: The interTwin project Digital Twin Engine and the alignment with Destination Earth, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8325, https://doi.org/10.5194/egusphere-egu24-8325, 2024.

EGU24-9492 | ECS | Orals | ESSI2.5

The Backbone of the Destination Earth Climate Adaptation Digital Twin 

Kai Keller, Mario Acosta, Supriyo Ghosh, Aina Gaya Avila, Ingo Wagner, and Stella Paronuzzi

Since the first concerns were raised in the 1980s that the climate may undergo catastrophic changes caused by the increasing greenhouse gas emissions[1,2], a number of multilateral efforts have been brought to life for investigating the future climate evolution. The most popular of such joint ventures are the Coupled Model Intercomparison Project (CMIP), and the Intergovernmental Panel on Climate Change (IPCC). The aim has always been to underpin the evidence of climate change, and providing information for policymakers to deal with the resulting consequences. However, the data from those initiatives lacks interactivity, it is generated once, based on a specific simulation protocol and scenario, and is only available at relatively low resolution and frequency due to storage limitations.

The Destination Earth initiative of the European Commission (DestinE) is building upon those efforts while trying to operationalize the generation of global climate projections, iteratively adapting the simulations to new requirements posed by data consumers. The Climate Adaptation Digital Twin (CAT) of DestinE promises interactive access to data generated by global operational Earth System Model (ESM) ensemble-simulations at very high resolution (4-5 Km atmosphere and 5-10 Km ocean) and high frequency (including variables available at sub-hourly frequency). It incorporates what-if scenarios as well as time-slice experiments. 

To deal with the considerable amount of generated data, CAT adopts a streaming approach, where the full model output data is exposed to the data consumers in continuously progressing windows as the climate model runs, erasing older data after having stored a user-defined subset of variables. The data is exposed as a generic state vector (GSV) providing a common interface to the data coming from different models (IFS, ICON, NEMO, FESOM) and realms (atmosphere, land surface, sea ice, and ocean). At the heart of the project lies the interaction with the data consumers driving the GSV design in terms of resolution and frequencies.

In this talk, we explore the opportunities that DestinE provides and the sophisticated software machinery that is required to make it happen. We will give an overall picture on the structure and vision of CAT, and highlight the challenges posed by interactivity, reproducibility, provenance, data accessibility, synchronization, and quality monitoring. In particular, we will showcase diagnostics from decadal high resolution climate simulations, report on the replicability of a coupled model running the ocean in a mixed precision implementation, and detail on the model workflow employed for operationalization of the climate projections. We will further give an outlook on the different kinds of simulations that are planned and explain how the data is provided to the data consumers. 

References

[1] Hansen, J., Fung, I., Lacis, A., Rind, D., Lebedeff, S., Ruedy, R., Russell, G., & Stone, P. (1988). Global climate changes as forecast by Goddard Institute for Space Studies three-dimensional model. Journal of Geophysical Research: Atmospheres, 93(D8), 9341–9364. https://doi.org/10.1029/JD093iD08p09341

[2] Manabe, S., & Stouffer, R. J. (1980). Sensitivity of a global climate model to an increase of CO2 concentration in the atmosphere. Journal of Geophysical Research: Oceans, 85(C10), 5529–5554. https://doi.org/10.1029/JC085iC10p05529

How to cite: Keller, K., Acosta, M., Ghosh, S., Gaya Avila, A., Wagner, I., and Paronuzzi, S.: The Backbone of the Destination Earth Climate Adaptation Digital Twin, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9492, https://doi.org/10.5194/egusphere-egu24-9492, 2024.

EGU24-11115 | ECS | Orals | ESSI2.5

Destination Earth Data Lake unlocking Big Earth Data processing 

Danaele Puechmaille, Michael Schick, Borys Saulyak, Martin Dillmann, and Lothar Wolf

The European Commission’s Destination Earth (DestinE) initiative will deploy several highly accurate thematic digital replicas of the Earth (Digital Twins) for monitoring and simulating natural and human activities, as well as their interactions. This will enable end-users and policy makers to execute “what-if” scenarios for assessing both the impact of environmental challenges (weather extremes, climate change etc.) and the efficiency of proposed solutions. DestinE is implemented in a strategic partnership between the European Space Agency (ESA), the European Centre for Medium-Range Weather Forecasts (ECMWF) and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT).

Data Lake is one of the three components of DestinE system. The DestinE Data Lake must tackle several technical challenges. Firstly, the unprecedented volumes of data generated on a frequent basis within the scope of DestinE call for novel and efficient data access and near-data processing services, beyond the traditional “data-to-the-user” paradigm (in which users must download a multitude of files locally, extracting the required parts e.g. variables, area-of-interest etc. and afterwards using them as inputs in their algorithms).

Secondly, the DestinE Data Lake must handle a wide variety of data. In order to offer users a uniform interface to all the data they need for their applications, the DestinE Data Lake must provide access not only to the challenging volumes of Digital Twin outputs but also to federated data from various existing and upcoming data spaces, beyond traditional Earth Observation. This is managed via a user-driven data portfolio and fulfilled by a harmonised data access layer that abstracts away the heterogeneity and complexity of the underlying data sources.

Thirdly, the intense processing requirements of DestinE Digital Twins are fulfilled by hosting them on European High-Performance Computing (EuroHPC) sites. Data produced by the Digital Twins (DTs) must be processed where produced, at the edge of the DestinE Data Lake. This is achieved having defined a reference architecture, geographically distributed, with cloud stacks deployed in close proximity with the HPCs, for efficient data exchange.

Last but not least, DestinE follows a user-centric approach, evolving in response to on-boarded use cases. This requires a flexible architecture and user-driven data portfolio/ services, which can easily evolve to emerging user needs, incorporate new services, workflows and data sources, including future Digital Twins.

How to cite: Puechmaille, D., Schick, M., Saulyak, B., Dillmann, M., and Wolf, L.: Destination Earth Data Lake unlocking Big Earth Data processing, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11115, https://doi.org/10.5194/egusphere-egu24-11115, 2024.

EGU24-11498 | ECS | Posters on site | ESSI2.5

Evaluating Tropical Precipitation Extremes: Insights from first simulations from nextGEMS and Destination Earth Climate Digital Twin 

Natalia Nazarova, Jost von Hardenberg, Paolo Davini, Matteo Nurisso, Silvia Caprioli, and Paolo Ghinassi

Understanding tropical precipitation extremes is crucial for capturing the complexities of global climate dynamics. In this study, we employ a novel methodology to examine these extremes in spatial and temporal resolutions that have not previously been explored in global climate modeling.

 

High-resolution data from nextGEMS Cycle 3 and preliminary DestinE simulations are our primary sources. We thoroughly analyzed advanced models from different historical periods, including ICON, IFS-FESOM, and IFS-NEMO. The research also includes a range of observational data sources, such as the MSWEP, ERA5, and EMERG datasets, to create a robust framework for comparison. Our methodological approach includes zonal mean analysis and probability distribution functions (PDFs), applied to data re-gridded to both standard 1° monthly and finer high-resolution 0.1° scales. This dual-resolution strategy is key for revealing detailed patterns and extremes in tropical precipitation.

 

The study uncovers notable alignments and discrepancies between model simulations and observational data. Some models show a high degree of accuracy in reflecting real-world observations, whereas models like ICON demonstrate significant biases, especially in extreme precipitation rates. Such variations in precipitation peaks and rates across different models underscore the need for adjusting simulation parameters to enhance accuracy.



How to cite: Nazarova, N., von Hardenberg, J., Davini, P., Nurisso, M., Caprioli, S., and Ghinassi, P.: Evaluating Tropical Precipitation Extremes: Insights from first simulations from nextGEMS and Destination Earth Climate Digital Twin, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11498, https://doi.org/10.5194/egusphere-egu24-11498, 2024.

EGU24-12246 | ECS | Orals | ESSI2.5

Advancing Progress on Earth System Digital Twins through an Integrated AI-Enabled Framework 

Brandon Smith, Craig Pelissier, Grey Nearing, Carlos Cruz, Deepthi Raghunandan, Mahmoud Saeedimoghaddam, and Vanessa Valenti

The development of Earth System Digital Twins (ESDTs) represents an ongoing journey towards more accurate and integrated simulations of Earth processes. Inherently interdisciplinary, the endeavor grapples with the challenge of melding subsystems developed by experts in different fields and organizations, requiring communication between different science domains, technology stacks, and data modalities. The Coupled Reusable Earth System Tensor (CREST) framework is a key aspect of our efforts to address these difficulties: by implementing a generic abstraction layer over existing tensor libraries (e.g. TensorFlow, PyTorch, JAX), CREST provides the software foundation for building, operating, and deploying community developed ESDTs. This framework is designed to allow scientists to easily couple together process-based and data-driven models into broader digital twin workflows, while taking advantage of significant efficiency improvements from hardware accelerators.

CREST aims to be a step forward in combining traditional modeling techniques with emerging computational methods, particularly in the context of machine learning. Machine learning plays a foundational role in our approach, both contributing to the development of new data-driven models and aiding in efficient coupling with existing models. Through CREST, we aim to enhance model integration and foster more dynamic interactions within the modeling pipeline – primarily addressing the issues of limited support in current frameworks for gradient propagation, hardware acceleration, and federation with external models. In addition, CREST operational capabilities will include data assimilation, end-to-end distributed model training, black-box model coupling, what-if scenario analysis, and an easy-to-use GUI interface for end users.

In the context of practical applications, the Terrestrial Environmental Rapid-Replication and Assimilation Hydrometeorological (TERRAHydro) system serves as an example of applying these principles in practice. Using CREST, TERRAHydro couples together several hydrologic and land surface subcomponents, such as soil moisture, evapotranspiration, and net ecosystem exchange, into a coupled land surface model. The efficiency of AI-based LSDTs such as TERRAHydro are expected to be able to carry out scenario analysis beyond existing traditional land surface models. Here we show results and comparisons for this application, discuss progress on CREST and TERRAHydro overall, and outline our roadmap going forward.

How to cite: Smith, B., Pelissier, C., Nearing, G., Cruz, C., Raghunandan, D., Saeedimoghaddam, M., and Valenti, V.: Advancing Progress on Earth System Digital Twins through an Integrated AI-Enabled Framework, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12246, https://doi.org/10.5194/egusphere-egu24-12246, 2024.

Debris flows as fast-moving and water-saturated sediment masses are particularly hazardous in alpine areas due to their high destructive power and poor predictability. We still do not fully know under what conditions debris flows occur and how to predict them. The most common method for predicting debris flow in warning systems and hazard assessment uses precipitation intensity and duration thresholds. However, these do not provide accurate and quantitative predictions of debris flow occurrence and are subject to high uncertainty due to limited data. Thanks to recent developments in novel monitoring technologies that have led to an unprecedented data explosion, it is now time to address these knowledge gaps innovatively and interdisciplinaryly. To this end, we develop a scalable and transferable catchment Digital Twin System (cDTS) that combines the latest knowledge from geomorphology, remote sensing, and hydrology to derive probabilistic rainfall intensity-duration (ID) thresholds from limited observations. The cDTS is a physics-informed genetic machine learning framework based on partially known physics, sparse and noisy data, and nonlinear dynamical networks. We test this framework on a small catchment in the Italian Dolomites to determine probabilistic thresholds for the occurrence of debris flows. The new rainfall thresholds are a negative exponential function controlled by infiltration capacity instead of a power law relationship. The cDTS is a lighthouse case for applications of the digital twin in geoscience, helping improve early warning system performance by providing timely, evidence-based information to the public and policymakers.

How to cite: Tang, H. and Francis, O.: A prototype of catchment-scale digital twin systems (cDTS) for debris-flow early warning , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15422, https://doi.org/10.5194/egusphere-egu24-15422, 2024.

EGU24-15500 | Orals | ESSI2.5

Advancing Marine Ecosystem Conservation with the Global Fish Tracking System on the Destination Earth Service Platform 

Tina Odaka, Anne Fouilloux, Daniel Wiesmann, Emmanuelle Autret, Mathieu Woillez, and Benjamin Ragan-Kelley

The Global Fish Tracking System (GFTS) project is dedicated to enhancing our understanding of wild fish essential habitats, particularly focusing on sea bass, a species of significant economic importance and addresses a recognized information gap highlighted by the International Council for the Exploration of the Sea (ICES). The lack of accurate data on these habitats poses challenges in formulating effective conservation policies. The project builds up on a large-scale tagging experiment on adult sea bass along the French coastlines, generating biologging data, and develops innovative software tools on the Destination Earth Service Platform (DESP), based on the Pangeo ecosystem and the pangeo-fish model, to geolocate fish and estimate their movement patterns based on various data sources.

The project will make use of Climate Change Adaptation data from the Climate Digital Twin (Routine and On-Demand for some higher resolution tracking), Sea Temperature observation (Model, Satellite, in-situ) from Copernicus Marine services (Sea temperature and associated value), Bathymetry (Gebco) and biologging in-situ data obtained from tagged-fish. Leveraging the Pangeo Infrastructure on the Destination Earth Service Platform (DESP), tools like pangeo-fish adhere to FAIR and TRUST principles to address challenges in estimating sea bass behaviour and movement by integrating modelling techniques and developing a Decision Support Tool (DST) for "what-if" scenario planning. The technical framework, including Xarray and Dask, facilitates scalable computations, while collaborative development on GitHub ensures an iterative, open-science approach. The model and approach developed are applicable across different scales, species, and regions, offering an adaptable platform for sustainable marine ecosystem conservation.

The impact of the project is twofold. In the short term, it introduces the GFTS and a Decision Support Tool into the DESP, leveraging advanced modelling and cloud computing to offer insights into the functioning of fish populations, aiding policy advisers in crafting effective conservation measures. The intuitive interface of the DST ensures accessible and informed decision-making. In the long term, the project establishes a foundation for sustainable marine ecosystem conservation by integrating advanced modelling, ensuring reproducibility and widespread accessibility, fostering future research, policymaking, and conservation endeavours. The project actively involves end-users, bridging the gap between complex modelling and practical decision-making, contributing to more effective fisheries management and marine conservation efforts.
In this presentation, the project will be presented as well as the current achievements and challenges. 

How to cite: Odaka, T., Fouilloux, A., Wiesmann, D., Autret, E., Woillez, M., and Ragan-Kelley, B.: Advancing Marine Ecosystem Conservation with the Global Fish Tracking System on the Destination Earth Service Platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15500, https://doi.org/10.5194/egusphere-egu24-15500, 2024.

EGU24-16329 | Posters on site | ESSI2.5

Leveraging High-Performance Computing for Enhanced Lava Flow Forecasting Workflow 

Antonio Costa, Louise Cordrie, Giovanni Macedonio, Gaetana Ganci, Annalisa Cappello, and Francesco Zuccarello

The integration of lava flow forecasting models with satellite remote sensing techniques marks a significant advancement in quantitative hazard assessment for effusive volcanic eruptions. Within the framework of the DT-Geo project, we are developing a lava flow workflow that harnesses High-Performance Computing (HPC) capabilities, aiming to improve hazard assessment through ensemble-based and data assimilation methods.

At the core of the workflow is the VLAVA code, which simulates the lava flow propagation, with temperature-dependent viscosity over a complex topography, erupting from one or more vents. The simulation runs for a given time period (order of one or more days), after which the simulated deposit is compared to the observed  lava flow field and, eventually, the observations are assimilated into the model for a further simulation. The measured data include changes of the eruption source parameters and/or the extension and temperature of the lava flow field. These are derived from direct observations on the field or by remote sensing from airborne, drones or satellites (e.g.: Pléiades, EOS-ASTER, SEVIRI, MODIS, VIIRS, Landsat, Sentinel, etc.). Data assimilation is conducted using PDAF, a dedicated software offering various approaches, including ensemble-based Kalman filters, nonlinear filters, and variational methods.

The model output provides the potentially impacted area by lava flows, including thickness and temperature distribution, for both a single scenario (utilized for estimating the impact of a lava flow) and an ensemble of weighted scenarios (for generating probabilistic hazard maps). We present the overarching concept of the workflow and share preliminary results obtained for historical eruptions of Mount Etna.

How to cite: Costa, A., Cordrie, L., Macedonio, G., Ganci, G., Cappello, A., and Zuccarello, F.: Leveraging High-Performance Computing for Enhanced Lava Flow Forecasting Workflow, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16329, https://doi.org/10.5194/egusphere-egu24-16329, 2024.

EGU24-17334 | Posters on site | ESSI2.5

Climate Adaptation Digital Twin transforming climate data to actionable information 

Jenni Kontkanen, Pekka Manninen, Mario Acosta, Pierre-Antoine Bretonnière, Miguel Castrillo, Paolo Davini, Francisco Doblas-Reyes, Barbara Früh, Jost von Hardenberg, Thomas Jung, Heikki Järvinen, Daniel Klocke, Devaraju Naraynappa, Sami Niemelä, Outi Sievi-Korte, and Stephan Thober

Destination Earth (DestinE) program develops high-precision digital twins (DTs) of the Earth to support decision making in Europe and thus enable more sustainable development and increased resilience against environmental changes. Climate Change Adaptation Digital Twin (Climate DT in brief) is one of the first priority DTs developed within DestinE. It will provide capabilities supporting climate change adaptation at regional and national levels at multi-decadal time scales. We introduce here the first prototype of Climate DT developed during Phase 1 of DestinE.

The Climate DT system is built around three one-kilometer-scale Earth-system models (ESMs); ICON, IFS-NEMO and IFS-FESOM. The ESMs have been adapted to run on the fastest supercomputer in Europe, EuroHPC LUMI that is in operation in Kajaani, Finland. The Climate DT system will also harness another state-of-the-art EuroHPC system, MareNostrum 5 that will become operational during 2024 in Barcelona, Spain. The extreme computing capacities provided by these systems enable Climate DT to perform global climate simulations at an unprecedented scale: multi-decadal simulation on 5 km global meshes. This enables providing globally consistent climate information at local levels.  

Climate DT introduces a novel approach where ESMs and impact-sector applications operate as part of the same workflow and the output of the ESMs is streamed to applications on real-time in a standardized form called generic state vector (GSV). This approach enables users to access the full model state as soon as it has been produced by the ESMs. It also makes the system scalable across an unlimited number of applications and solves a technical challenge of handling huge amounts of data. Most importantly, this new climate information system enables transforming climate data to actionable information. During Phase 1 of DestinE, this approach is demonstrated through five impact-sector applications that provide information on (1) wind energy supply and demand, (2) wildfire risk and emissions (3) river flows and fresh water availability, (4) hydrometeorological extreme events, and (5) heat stress in urban environments.  

In this presentation, we will present the overview of the first prototype of Climate DT, including technical design of the system, performed simulations and implications for the users. We will also discuss the plans for the future development of the system.

How to cite: Kontkanen, J., Manninen, P., Acosta, M., Bretonnière, P.-A., Castrillo, M., Davini, P., Doblas-Reyes, F., Früh, B., von Hardenberg, J., Jung, T., Järvinen, H., Klocke, D., Naraynappa, D., Niemelä, S., Sievi-Korte, O., and Thober, S.: Climate Adaptation Digital Twin transforming climate data to actionable information, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17334, https://doi.org/10.5194/egusphere-egu24-17334, 2024.

EGU24-17836 | Posters on site | ESSI2.5

Gradient boosting-based soil wetness for forestry climate adaptation in HarvesterSeasons service -training a model to forecast soil water index SWI from a comprehensive set of IFS model predictors in Destination Earth 

Mikko Strahlendorff, Anni Kröger, Miriam Kosmale, Golda Prakasam, Mikko Moisander, Heikki Ovaskainen, and Asko Poikela

Gradient boosting-based soil wetness for forestry climate adaptation in HarvesterSeasons service -training a model to forecast soil water index SWI from a comprehensive set of IFS model predictors in Destination Earth was an exercise that clearly improved a crucial part of the forestry climate service HarvesterSeasons.com. Forestry in nordic countries has to adapt to good and sustainable winter conditions being less and less available. Dry summer conditions are being looked for to compensate for weak winter times.

We present our service, the machine learning method for the new product and the validation of the new product. For machine learning Xtreme Gradient was used to train the Earth Observation product Soil Water Index from ERA5-Land, soilgrids.org and other features. Predicting is then enabled from Destination Earth Extremes and Climate Adaptation Digital Twins.

How to cite: Strahlendorff, M., Kröger, A., Kosmale, M., Prakasam, G., Moisander, M., Ovaskainen, H., and Poikela, A.: Gradient boosting-based soil wetness for forestry climate adaptation in HarvesterSeasons service -training a model to forecast soil water index SWI from a comprehensive set of IFS model predictors in Destination Earth, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17836, https://doi.org/10.5194/egusphere-egu24-17836, 2024.

EGU24-18293 | Posters on site | ESSI2.5

Enabling km-scale coupled climate simulations with ICON on GPUs 

Jussi Enkovaara, Claudia Frauen, Daniel Klocke, Lukas Kluft, Luis Kornblueh, Sergey Kosukhin, Tuomas Lunttila, Rene Redler, and Reiner Schnur

The Icosahedral Nonhydrostatic (ICON) weather and climate model is a modelling framework for numerical weather prediction and climate simulations. ICON is implemented mostly in Fortran 2008 with the GPU version based mainly on OpenACC. ICON is used on a large variety of hardware, ranging from classical CPU cluster to vector architecture and different GPU systems.

An ICON model configuration developed for km-scale climate simulations is used as a scientific prototype for the digital twin of the Earth for climate adaptation with in the Destination Earth program of the European Comission. Here we focus on our effort to run these coupled ICON configurations at km-scale on LUMI, a HPE Cray EX system with a GPU partition based on AMD MI250x’s.

We present the model configuration, performance results and scalability of the simulation system on Lumi and compare it with results on other GPU and CPU based systems.

How to cite: Enkovaara, J., Frauen, C., Klocke, D., Kluft, L., Kornblueh, L., Kosukhin, S., Lunttila, T., Redler, R., and Schnur, R.: Enabling km-scale coupled climate simulations with ICON on GPUs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18293, https://doi.org/10.5194/egusphere-egu24-18293, 2024.

EGU24-18891 | Posters on site | ESSI2.5

The development of a Hybrid Renewable Energy Forecasting System under the Destination Earth initiative. 

Charalampos Kontoes, Dorella Papadopoulou, Nikolaos S. Bartsotas, Stelios Kazadzis, George Koutalieris, Christos Stathopoulos, Foteini N. Salta, Platon Patlakas, Kyriaki Papachristopoulou, lias Fountoulakis, Anagnostis Delkos, Symeon Symeonidis, and Vasileios Sinnis

The transition towards clean energy consists a fundamental challenge in most recent EU policies as well as the 2030 Agenda for Sustainable Development and Paris Agreement on climate change. However, the dependency of renewable energy systems from climate and weather, renders it into a quite challenging task. Besides a number of factors that need to be initially taken into account and relate to the efficiency of the systems and their resilience to climate-related factors, the day-to-day energy market requires accurate and detailed information on solar and wind availability in different spatial scales, as energy users range from roof top private owners to regional and large-scale facilities. This kind of information can only be provided through a fusion of numerical models and satellite based earth observation platforms.

In order to accommodate this need and act as a decision support system, a Hybrid Renewable Energy Forecasting System (HYREF) is developed for solar and wind forecasting, under the framework of Destination Renewable Energy (DRE), a European Space Agency (ESA) funded project. The HYREF utilises outputs from high resolution forecasting models, Destination Earth (DestinE) Digital Twin forecast data and Data Lake data (e.g. Global Ocean 1/12° Physics Analysis and Forecast, Vegetation Indices, CORINE Land Cover, and Global 10-daily Fraction of Vegetation Cover, data from the Weather-induced extremes Digital Twin among others) as well as end-user provided historical and real-time data that allow for specific site adaptation through probabilistic models and statistical post processing. The service will use the newly established Service Platform (DESP) that is supported by the DestinE initiative. The HYREF software is flexible, will provide spatial upscaling options based on the DESP data coverage and, as a user-driven service, will evolve gradually through continuous interaction and feedback from the end-users with additional direct engagement of market stakeholders. The final product is expected to increase energy efficiency and cater the needs of a broad spectrum of renewable energy users from private owners to large-scale facilities, industrial users, up to transmission and distribution national operators.

How to cite: Kontoes, C., Papadopoulou, D., Bartsotas, N. S., Kazadzis, S., Koutalieris, G., Stathopoulos, C., Salta, F. N., Patlakas, P., Papachristopoulou, K., Fountoulakis, L., Delkos, A., Symeonidis, S., and Sinnis, V.: The development of a Hybrid Renewable Energy Forecasting System under the Destination Earth initiative., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18891, https://doi.org/10.5194/egusphere-egu24-18891, 2024.

EGU24-19153 | ECS | Posters on site | ESSI2.5

Applying the DestinE Extremes digital twin to air quality forecasts and emission scenario simulations 

Anne Caroline Lange, Sabine Schröder, Philipp Franke, Michael Langguth, Elmar Friese, and Martin G. Schultz

Events of extreme air pollution pose threats to humans and the environment. To investigate air quality under extreme atmospheric situations, the DestinE air quality use case developed a comprehensive user interface that enables high-resolution air quality forecasts with diverse analysis options. The user interface encapsulates the two state-of-the-art approaches that are physics-based numerical simulations with the chemistry transport model EURAD-IM (European Air pollution Dispersion – Inverse Model) and data-driven machine learning forecasts with MLAir (Machine Learning on Air data). The EURAD-IM simulations are coupled to the meteorological output of the DestinE digital twin for weather extremes, which provides high-resolution information (~4.4 km). An additionally implemented machine learning based postprocessing even allows for the downscaling of the EURAD-IM forecast output to a resolution of 1 km. MLAir produces 4-day point forecasts at station sites using data from the Tropospheric Ozone Assessment Report (TOAR) data base. The developed system is complemented by an efficient module that enables emission scenario simulations to investigate and develop air pollution mitigation strategies for future extreme events under realistic conditions.
The established user interface is demonstrated by two selected air quality extreme events in early 2017 and summer 2018. It aims to provide a new quality of air pollution information that supports the core users, i.e., environment agencies, in decision making. For the near future, it is planned to fully embed the system to the Destination Earth Service Platform (DESP) such that it will be available to a wider community. Besides assisting policy making, the air quality products help to answer scientific questions on air quality and atmospheric chemical processes under extreme weather conditions that are expected to increase in future. 

How to cite: Lange, A. C., Schröder, S., Franke, P., Langguth, M., Friese, E., and Schultz, M. G.: Applying the DestinE Extremes digital twin to air quality forecasts and emission scenario simulations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19153, https://doi.org/10.5194/egusphere-egu24-19153, 2024.

EGU24-19669 | ECS | Posters on site | ESSI2.5

Destination Earth Data Lake (DEDL) Service – Access and discovery data through the Harmonized Data Access (HDA)  

Patryk Grzybowski, Marcin Ziółkowski, Aubin Lambare, Arnaud Le Carvennec, Christoph Reimer, and Michael Schick

Destination Earth initiative (DestinE), driven by the Exploitation of Meteorological Satellites (EUMETSAT), the European Space Agency (ESA) and the European Centre for Medium-Range Weather Forecasts (ECMWF) aims to create a highly accurate replica or Digital Twin of the Earth. The first two Digital Twins describe weather-induced and geophysical extremes, as well as climate change adaptation, but the number of Digital Twins will continue to grow. To develop new models, there is a high need to gain access to data and dedicated services. One of the three key components of DestinE is the Destination Earth Data Lake (DEDL) which provides discovery, access, and big data processing services.

DEDL facilitates data storage and access through three primary types of data entry points: the Fresh Data Pool (FDP), Federated Datasets (FED), and the Digital Twins Data Warehouse. DEDL offers big data processing which allows near-data processing and by this conceptually supports ML/AI applications executed on the DEDL. The DestinE data lake federates with existing data holdings as well as with complementary data from diverse sources like in-situ, socio-economic, or data-space data. Thanks to DEDL, it is possible to get immediate access to data like Sentinel-1/2/3/5P missions. What is more, all this data is provided as a full archive immediately available to the user. With instant access to current Earth Observation (EO) data, researchers and other users can conduct time-sensitive analyses without the delays associated with data ordering. Moreover, having a comprehensive archive of EO data enables trend analysis and the investigation of long-term changes.

In this presentation we will demonstrate how to use Harmonized Data Access (HDA) – one of the tools developed within the DEDL. We will present the available datasets provided through HDA and guide you on how to use it to search collections and products. as Additionally, we will demonstrate how to obtain these datasets for use in your own environment. As SpatioTemporal Asset Catalogs (STAC) is used as a metadata standard, discovery and work with data provided by DEDL is user-friendly. Thanks to HDA it is possible to use a single account to explore data across tens of collections and petabytes of products.

The DestinE Data Lake is an initiative that revolutionizes the handling of Earth Observation data, improving capabilities in climate research, and supporting sustainable development efforts. The principles behind the DEDL will enable data harmonization and federation on a scale beyond current capabilities. 

How to cite: Grzybowski, P., Ziółkowski, M., Lambare, A., Le Carvennec, A., Reimer, C., and Schick, M.: Destination Earth Data Lake (DEDL) Service – Access and discovery data through the Harmonized Data Access (HDA) , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19669, https://doi.org/10.5194/egusphere-egu24-19669, 2024.

EGU24-20023 | ECS | Posters on site | ESSI2.5

Developing a generic Adaptation Modelling Framework for Destination Earth grounded in Flood Risk Management 

Sarah Wright, Björn Backeberg, and Kathryn Roscoe

Rapid open-source physics-based flood and impact models are critical resources for adaptation planning. Due to their complexity, these models often remain inaccessible to users who rely on them for decision-making. An advanced level of technical knowledge is required to set up and run the models. This body of work aims to develop an adaptation modelling framework that automates model pre-processing, workflows, and post-processing and allows non-technical end-users to access these modelling advances, empowering government agencies, practitioners, and researchers to evaluate meaningful “what-if” scenarios, such as specific events, future conditions, or protective measures.

Via engagement with potential and current users from municipalities, regional governments, European agencies such as EEA and EC, research institutes, and communities closely linked to the EU Mission on Climate Adaptation, the research team will build upon years of research and development on the adaptation modelling framework FloodAdapt. The project will deliver a design, user documentation and demonstrator of a flexible, modular, expandable, and transferable adaptation modelling framework that can automatically modify and execute state-of-the art and open-source flood and impact models to simulate, visualize, and assess compound flood scenarios, impacts, and risk for a wide variety of end users in different demographic and physical contexts. End users of the adaptation modelling framework will ultimately be able obtain quality flood and impact maps, equity-focused visuals, and informative metrics to support their planning needs and facilitate genuine stakeholder engagement, . The framework will also be generalized to enable other scientific disciplines to benefit from the adaptation modelling framework, extending the impact of the project.

How to cite: Wright, S., Backeberg, B., and Roscoe, K.: Developing a generic Adaptation Modelling Framework for Destination Earth grounded in Flood Risk Management, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20023, https://doi.org/10.5194/egusphere-egu24-20023, 2024.

EGU24-20097 | Orals | ESSI2.5

The Global Extremes Digital Twin of Destination Earth: successes and challenges 

Benoît Vannière, Irina Sandu, Peter Dueben, Richard Forbes, Inna Polichtchouk, Annelize Van Niekerk, Birgit Sützl, Michail Diamantakis, Jasper Denissen, Estibaliz Gascon, Michael Maier-Gerber, Llorenç Lledo, Ivan Bastak-Duran, Aristofanis Tsiringakis, Tobias Becker, Josef Schröttle, and Ziga Zaplotnik

At the end of the first phase of the Destination Earth initiative in May 2024, ECMWF will deliver a prototype of the global component of the Weather-Induced Extremes Digital Twin (or Global Extremes DT). The Global Extremes DT will monitor worldwide extreme weather events up to 5 days ahead and at an atmospheric resolution of 5 km. Furthermore, it incorporates two impact sector models: the CaMa-Flood river routing model for predicting flood risk, and a flexible aerosol scheme that monitors selected aerosol species, contributing to air quality assessment. Since August 2023, a daily forecast is performed with the prototype Global Extremes DT on the ECMWF's HPC Atos. In this presentation, we will summarise the work done during the first phase of Destination Earth and revisit some successes and challenges encountered in predicting extreme weather events.

A large selection of extreme weather cases has been analysed to demonstrate the added value of the high-resolution DT over the ECMWF operational forecast. Clear improvements were found for near-surface fields in regions of complex terrain, the intensification of tropical cyclones, the magnitude of orographic precipitation and subsequent flood events. Yet, we also find that standard NWP scores are not improved readily when the horizontal resolution is increased, and specific developments had to be made to adapt the physics of the model and exploit the full benefit of high-resolution. Although we find that non-hydrostatic effects do not matter up to a resolution of 2.8 km, and thus do not justify the extra cost of a non-hydrostatic dynamical core, an exception is strong subtropical jet regimes over high orography. In that case, the orographic gravity waves are not handled correctly by the hydrostatic dynamical core. As a result, the model’s mean orography needed special filtering at small scales and the model timestep had to be reduced. The degradation of NWP scores at kilometre-scale also prompts the need to find new ways to assess the improvement of physical realism.

Finally, we will give an overview of the current developments at ECMWF that will be part of future versions of the global extremes DT, such as a new prognostic TKE turbulent scheme, modified settings for atmospheric convection and a revised orographic drag parameterization.  

How to cite: Vannière, B., Sandu, I., Dueben, P., Forbes, R., Polichtchouk, I., Van Niekerk, A., Sützl, B., Diamantakis, M., Denissen, J., Gascon, E., Maier-Gerber, M., Lledo, L., Bastak-Duran, I., Tsiringakis, A., Becker, T., Schröttle, J., and Zaplotnik, Z.: The Global Extremes Digital Twin of Destination Earth: successes and challenges, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20097, https://doi.org/10.5194/egusphere-egu24-20097, 2024.

EGU24-20322 | Posters on site | ESSI2.5

Destination Earth Core Service Platform Use Cases: A Collaborative approach for DestinE validation and evolution 

Jolanda Patruno, Antonio Romeo, Claudia Vitolo, Daniel Wiesmann, David Arthurs, Simone Fratini, Theodora Papadopoulou, and Tarek Habib

DestinE is the EC’s ambitious initiative to develop a digital twin of the Earth’s system to enable users to monitor and simulate natural and human activities to support environmental policies and decision-making processes at European and global scales, including the green transition of the European Union, aligning with the goals set out in the EC’s Green Deal and Digital Strategy.

RHEA is leading the Destination Earth Use Cases project consortium to select a set of use cases to be developed and integrated within DESP, which will act as a gateway to the other DestinE components, namely the Destination Earth Data Lake, led by EUMETSAT, and the Digital Twins and Engine, which are the responsibility of ECMWF. This use cases will be developed using an Agile framework and a co-design approach with their reference community to enable a validation of the DestinE infrastructure and its evolution.

Five projects started in November 2023:

  • Global Fish Track System (GFTS), whose objective is to develop a fully functional Decision Support Tool for establishing essential fish habitat conservation areas under different scenarios.
  • DestinE Sea Ice Decision Enhancement (DESIDE), which aims to meet the needs of policy and decision makers who require information on the past, current, and forecasted sea ice and other relevant conditions for operational purposes in the Baltic Sea, European Arctic Ocean, and the rest of the polar regions.
  • CITINEXUS, which is modelling environmental, social, and economic impacts of interventions in road networks, mobility, and urban fabric, using High Frequency Location Based data and evaluating baseline conditions for human mobility, including key performance indicators like air quality, population distribution, public health, and service accessibility.
  • DestinE Renewable Energy (DRE), which aims to support evidence-based policy-making in the transition towards green energy by integrating new decision-making tools for renewable energy source (RES) producers.
  • UrbanSquare project, which is developing a tool to assess the urban climate risks, namely: Urban heat, flooding forecast, storm surges, sea level rise, air pollution, deterioration of infrastructure and increased demand on resources.

The presentation will include and overview of each selected project as well as a list of opportunities to contribute to DestinE in the future.

How to cite: Patruno, J., Romeo, A., Vitolo, C., Wiesmann, D., Arthurs, D., Fratini, S., Papadopoulou, T., and Habib, T.: Destination Earth Core Service Platform Use Cases: A Collaborative approach for DestinE validation and evolution, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20322, https://doi.org/10.5194/egusphere-egu24-20322, 2024.

EGU24-20343 | Posters on site | ESSI2.5

The DestinE Community's Journey, From User Engagement to Recommendations gathering 

Eleni Karachaliou, Efstratios Stylianidis, Aikaterini Bakousi, Zoi-Eirini Tsifodimou, Costas Bissas, Antonio Romeo, Jolanda Patruno, Rob Carrillo, and Zachary Smith

The DestinE Use Cases Project, executed by a Consortium led by RHEA Group with the Aristotle University of Thessaloniki and Trust-IT, regards the selection and implementation of a first set of Use Cases meant to demonstrate the ability of the DestinE infrastructure in general, and the DestinE Service Platform (DESP) in particular, to provide actionable information and decision support to its end-users. The project aims also at actively engaging the broad community of DestinE stakeholders, gathering their requirements, and encouraging their direct involvement and guidance in the continuous evolution of the DestinE infrastructure towards the future Phases of the Initiative.

The DestinE Community leads the establishment of a network that fosters continuous interactions among users, developers, stakeholders, and partners, with the aim of enhancing DestinE capabilities and catalyzing cross-sectoral collaborations. The community serves as a platform for open and transparent opportunities for ongoing exchange. By incorporating multiple perspectives, opinions, experiences, and expertise, DestinE community members can advance their knowledge, gain value-driven experiences, and contribute to their professional growth.

Currently boasting over 1,000 members representing diverse stakeholders, the DestinE community is committed to contributing to the design and development of the initiative and its components through a co-design process. This approach ensures that the system is responsive to real user needs and introduces innovations compared to existing operational environments. Community feedback plays a crucial role in understanding user needs and preferences, determining the importance of features or functions, testing and validating developed features with end users, and fostering collaboration among individuals from varied backgrounds, leading to innovative ideas and approaches in DESP development.

To support community building, a range of activities, both online and physical, are underway. These activities aim to actively engage and involve DestinE user communities in co-design, development, co-evolution, and use of DestinE capabilities, applications, and solutions. They also facilitate gathering stakeholders' needs and recommendations, encouraging their direct involvement and guidance in the continuous evolution of DestinE infrastructure and capabilities towards future phases of the initiative. Additionally, these activities serve to demonstrate the DestinE infrastructure's capabilities through a first set of use cases.

Meetings and open discussions with key stakeholders of DestinE, especially those related to DESP, have already resulted in early recommendations. These recommendations will be translated into system requirements for the DESP platform as part of the ongoing collaborative efforts within the DestinE Community.

How to cite: Karachaliou, E., Stylianidis, E., Bakousi, A., Tsifodimou, Z.-E., Bissas, C., Romeo, A., Patruno, J., Carrillo, R., and Smith, Z.: The DestinE Community's Journey, From User Engagement to Recommendations gathering, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20343, https://doi.org/10.5194/egusphere-egu24-20343, 2024.

EGU24-20503 | ECS | Posters on site | ESSI2.5

Earth Observation Based Digital Twin Components of the Earth System 

Edward Malina, Martin Wearing, and Diego Fernandez

Climate change represents one of the most urgent challenges facing society today, with rising sea levels, increasing ocean acidification, more frequent and intense extreme events such as floods, heat waves and droughts, impacting across different sectors, ecosystems, and endangering human lives and property. Population growth is also expected to amplify current pressures on critical resources such as freshwater and food, intensify the stress on land and marine ecosystems, and increase environmental pollution, impacting health and biodiversity.  

The latest advances in Earth Observation (EO) science and R&D activities are opening the door to a new generation of EO data products, novel applications and scientific breakthroughs offering a novel, advanced and holistic view of the Earth system, its processes, and its interactions with human activities and ecosystems. These emerging capabilities offer unique opportunities for an enhanced and extensive use of EO technology in the development of digital twins. In particular, those EO developments together with new advances in sectorial modelling, computing capabilities, AI and digital technologies offer excellent building blocks to realise novel EO-based Digital Twin Components (EO DTCs) that may contribute and maximise the impact of EO satellite technology in the design and implementation of future operational Digital Twins

The ESA Digital Twin Earth programme aims at developing and demonstrating, to a pre-operational stage, a set of EO-based DTCs as advanced replicas of key components of the Earth system including their interactions with human activities and ecosystems. These EO DTCs shall be designed to serve a wide variety of users and with a strong focus on valorising the role of EO capabilities. With this presentation, we highlight ESA’s basic principles and functional design elements for the development of EO-based DTCs, provide a scientific and technical roadmap for the development of future DTCs, along with their interaction with the DestinE initiative. Furthermore, we will highlight the eight priority thematic domains for DTC development, based on feedback from the scientific community, and highlight the tasks to be undertaken to achieve DTC demonstrators. This includes, allowing users to access, analyse, visualise and interact with advanced EO-data, represent Earth system processes and feedbacks, scientifically sound integration of EO-data, models, AI, hybrid methods to generate high spatial and temporal resolution datasets, and facilitating what-if scenarios.

How to cite: Malina, E., Wearing, M., and Fernandez, D.: Earth Observation Based Digital Twin Components of the Earth System, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20503, https://doi.org/10.5194/egusphere-egu24-20503, 2024.

EGU24-21564 | Posters on site | ESSI2.5

A Digital Assistant for Digital Twin Earth 

Manolis Koubarakis, Marco Corsi, Cristian Leoni, Giorgio Pasquali, Chiara Pratola, Simone Tilia, Sergios-Anestis Kefalidis, Konstantinos Plas, Mariangela Pollali, Myrto Tsokanaridou, Jakob Heinrich Hackstein, Gencer Sümbül, and Begüm Demir

We present an AI-powered digital assistant that includes four search engines for satellite images (search by image, search by caption, visual question answering and knowledge graph question answering) that are orchestrated by a task interpreter in order to answer complex requests of users looking for Earth observation data. The digital assistant will be demonstrated in three use cases: vessel detection, water bodies dynamics and training dataset construction. The digital assistant builds on recent work of the academic project partners on deep learning techniques for satellite images, search engines for satellite images, visual question answering, question answering over knowledge graphs and linked geospatial data, and question answering engines for satellite data archives. This work is funded by the European Space Agency project "DA4DTE: Demonstrator Precursor Digital Assistant Interface For Digital Twin Earth''. The project consortium is led by the Italian company eGEOS, with the National and Kapodistrian University of Athens and the TU Berlin as subcontractors.

How to cite: Koubarakis, M., Corsi, M., Leoni, C., Pasquali, G., Pratola, C., Tilia, S., Kefalidis, S.-A., Plas, K., Pollali, M., Tsokanaridou, M., Hackstein, J. H., Sümbül, G., and Demir, B.: A Digital Assistant for Digital Twin Earth, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21564, https://doi.org/10.5194/egusphere-egu24-21564, 2024.

NSF Unidata is a community data facility for the Earth Systems Sciences (ESS), established in 1984 by U.S. universities with sponsorship from the U. S. National Science Foundation. NSF Unidata exists to engage and serve researchers and educators who are advancing the frontiers of their fields; we support their efforts by creating opportunities for community members from many backgrounds and disciplines to share data, knowledge, methods, and expertise. As part of this effort, we strive to provide well-integrated data services and software tools that address the entire geoscientific data lifecycle, from locating and retrieving useful data, through the process of analyzing and visualizing data either locally or remotely, to curating and sharing the results. NSF Unidata currently serves more than 1,500 universities and colleges worldwide, which form the core of a member community spanning thousands of government and research institutions worldwide that rely on Unidata products and services.

Dramatic changes in the technological, scientific, educational, and public policy landscape are transforming the ways our community members conduct their research and educate new generations of scientists. To meet these challenges, Unidata is reimagining how the program can best fulfill its mission. This proposal provides a description of how Unidata plans to serve its community going forward by focusing on four types of activities:

  • Providing Data and Tools: ensuring fair and equitable access to ESS and other data from a variety of sources, along with cutting-edge tools to analyze and visualize that data.
  • Reducing Barriers to Participation: building partnerships with minority-serving institutions and under-resourced groups to increase engagement and collaboration, helping to build a larger, more inclusive community of ESS practitioners.
  • Fostering Community Action: engaging community members to advance adoption of initiatives like FAIR and CARE data principles to promote Open Science concepts, strengthening ESS teaching and research.
  • Providing Innovative Technical Solutions: guiding the ESS community toward technical solutions that leverage the most useful innovations in AI/ML, modern open source software, and cloud-centric data-proximate analysis.

Within these broad categories, Unidata proposes a variety of actions guided by the concept of convergence science, wherein individuals from across many disciplines collaborate to address “Grand Challenge” questions in areas such as climate change, ocean health, and natural disaster resilience. Unidata’s part in this endeavor centers on the creation of community hubs, which will bring together varied data, software tools for analysis and visualization, and learning resources to inform the community members who gather to find innovative courses of action with respect to these complex problems.

In this presentation, I’ll describe how NSF Unidata is reimagining its future activities in delivering a comprehensive suite of products and services to advance Earth Systems Science research and education by partnering with a broad range of users in the community.

How to cite: Ramamurthy, M.: NSF Unidata Reimagined:  Data Services to Advance Convergent Earth Systems Science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3617, https://doi.org/10.5194/egusphere-egu24-3617, 2024.

EGU24-4067 | ECS | Posters on site | ESSI2.8

SSHADE-BandList, a novel database of absorption and Raman bands of solids  

Lucia Mandon, Bernard Schmitt, Damien Albert, Manon Furrer, Philippe Bollard, Maria Gorbacheva, Lydie Bonal, and Olivier Poch

A critical missing database for the astrophysics and planetary science community using spectroscopy data is a compilation of band parameters (e.g., their position, width, intensity) of solids, for comparison with laboratory and field spectra, and observations of extraterrestrial objects. While many databases exist for gases [1], there's a scarcity for solids and liquids (mostly as tables in a few books and review papers), and the mode attributions of bands is not always clear.

The Solid Spectroscopy Hosting Architecture of Databases and Expertise (SSHADE) (https://www.sshade.eu/; [2]) is hosting data from 30 research groups in spectroscopy of solids across 15 countries. It provides spectra of solids relevant to astrophysics and planetary science (ices, minerals, carbonaceous matters, meteorites…) and over a wide range of wavelengths (mostly X-ray and VUV to sub-mm). Initial compilation of the “BandList” database [3], which is hosted in SSHADE, was publicly released in October 2021. It is an ongoing effort to provide the parameters (position, width, intensity, and their accuracy, shape) associated with electronic, vibration, and phonon bands of simple solids (ices, simple organics, minerals), in absorption and in Raman emission, and for different pressure and temperature. It also includes the solid composition and isotopic species involved, as well as the mode assignment (Fig. 1). The database is compilated from exhaustive review of the literature and laboratory measurements on well-characterized species, and as of early 2024, it consisted of over 1240 bands associated with 60 different band lists, including minerals and ices in different phases. An online search tool allows users to find specific bands or lists. Results can be displayed graphically using a spectra simulator with various unit and display options (Fig. 1), and data can be exported for further analysis. 

 

Figure 1. Absorption band list of natural calcite [4] from the SSHADE-BandList interface. (a) Bands displayed individually. (b) Sum of bands of whole band list.

Development of the SSHADE-BandList database interface and content will most likely last many years. This tool is expected to be crucial, aiding in the identification of unknown absorption bands in astrophysical and solar system objects, of best spectra to use in radiative transfer models, and in guiding the conception of new instruments.

 

References 

[1] Albert et al. (2020), Atoms 8(4)

[2] Schmitt et al. (2014), EPSC 2014

[3] Schmitt et al. (2022), EPSC 2022

[4] Leclef and Schmitt (2022), SSHADE/BANDLIST (OSUG Data Center)

How to cite: Mandon, L., Schmitt, B., Albert, D., Furrer, M., Bollard, P., Gorbacheva, M., Bonal, L., and Poch, O.: SSHADE-BandList, a novel database of absorption and Raman bands of solids , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4067, https://doi.org/10.5194/egusphere-egu24-4067, 2024.

EGU24-5149 | Orals | ESSI2.8

The UK Environmental Data Service; transdisciplinary data sharing using common standardised approaches, from National to European 

Matthew Harrison, Stephen Mobbs, Emma Bee, Helen Peat, Helen Snaith, Sam Pepler, Martin Juckes, and Gordon Blair and the UKRI NERC Environmental Data Service

The Natural Environment Research Council Environmental Data Service (NERC EDS) provides integrated data services across the breadth of NERC’s data holdings and coordinates closer collaboration and development between NERC’s five environmental data centres. Data is central to modern understanding of our environment. Environmental science is underpinned by access to high quality sources of data and data services. As the principal funder of environmental science in the UK, NERC has supported comprehensive data services and policies since its creation over 50 years ago. Today NERC has five Environmental Data Centres embedded within its Research Centres:
• The British Oceanographic Data Centre (BODC) provides data and services across marine science and is embedded within the National Oceanography Centre.
• The Centre for Environmental Data Analysis (CEDA) provides both atmospheric and Earth Observation data and is embedded within the National Centre for Atmospheric Science and the National Centre for Earth Observation.
• The Environmental Information Data Centre (EIDC) supports the data requirements of the terrestrial and freshwater sciences and is embedded within the UK Centre for Ecology and Hydrology.
• The UK Polar Data Centre (PDC) is responsible for all of the UK’s polar data holdings and associated services and is embedded within the British Antarctic Survey.
• The National Geoscience Data Centre (NGDC) provides geoscience and subsurface data and services and is embedded within the British Geological Survey.
Each of the five environmental data centres specialises in data within a particular sub-discipline of environmental science and serves not only NERC’s science community but also a much broader community of users and stakeholders, which span research, industry, education, government and voluntary organisations. At the same time, science and its applications are becoming increasingly multi-disciplinary. Often users of NERC data will need to access the services provided by multiple data centres, they will also need to be interoperable in a European and global context.
In order to serve an ever-growing community of data users and stakeholders, in 2018 NERC created its Environmental Data Service to coordinate across the data centres. During its first five years the EDS has provided growing coordination between NERC’s data centres, both for user services and the development of new, discipline-independent services.
NERC has recently recommissioned its data services for the period 2023-2028. As a consequence, the EDS is embarking on an ambitious plan to deliver increasingly integrated services across the full breadth of NERC science and meeting the environmental data needs of stakeholders and users across the UK and beyond. This will require further development of common back-end services and front-end development of FAIR practices including standardised vocabularies and ontologies to support both disciplinary science and wider stakeholder engagement but increasingly an approach to transdiciplinarity to facilitate both next generation science and approaches to foster wider data engagement in responding to the grand societal challenges.

How to cite: Harrison, M., Mobbs, S., Bee, E., Peat, H., Snaith, H., Pepler, S., Juckes, M., and Blair, G. and the UKRI NERC Environmental Data Service: The UK Environmental Data Service; transdisciplinary data sharing using common standardised approaches, from National to European, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5149, https://doi.org/10.5194/egusphere-egu24-5149, 2024.

EGU24-5393 | Posters on site | ESSI2.8

BITS: BluePrints for the Integration of Terminology Services in Earth System Sciences 

Alexander Wolodkin, Anette Ganske, Angelina Kraft, Andrea Lammer, Claudia Martens, Ichrak Salhi, Markus Stocker, Hannes Thiemann, and Claus Weiland

The BITS project is building a Terminology Service (TS) for Earth System Sciences (ESS TS). As a first step it will develop this service for the subfields of climate science (results from climate simulations) and geodiversity collections (representing Earth’s diversity of i.a. rocks, fossils, soils, sediments). The project will use the existing Terminology Service of the TIB – Leibniz Information Centre for Science and Technology, which currently contains  190 ontologies, more than 1.2 million terms and over 26,000 properties from a range of domains such as architecture, chemistry, computer science, mathematics and physics. It has implemented the ESS collection within the TIB TS by now, which already contains relevant terminologies for the ESS and to which further relevant terminologies will be added. 

The ESS TS will be integrated into the two different data repositories of the German Climate Computing Center (DKRZ) and the Senckenberg - Leibniz Institution for Biodiversity and Earth System Research (SGN):

  • DKRZ will use the TS to develop a more user-friendly search for its World Data Center for Climate (WDCC) repository. The terminologies will be used to suggest additional and more relevant search terms to users. This will help users who are unfamiliar with the terminology used by the climate community to find the right keywords for their data search and to get accurate and high quality search results.
  • SGN will use the TS to add standardised structured metadata to geothematic Digital Specimens in their digital collections. This will increase the FAIRness of collection data, i.e. foster self-contained discovery and processing of Digital Specimens by software agents or, in short, machines (machine actionability).  

The experience gained in building the ESS TS and integrating it into the repositories at DKRZ and SGN will be used to create blueprints to connect later on other Earth System Science repositories to the TS. We also aim to work closely with NFDI4Earth and the wider ESS community, and with TS4NFDI as the NFDI base service project for terminology services.

As BITS evolves, ESS TS will be supplemented by additional components e.g. to support FAIR semantic mappings (leveraging on SGN’s mapping.bio service). However, feedback from the wider ESS community about expectations for such a service and their needs is welcome and required for the project. Our aim is a Terminology Service, which serves as a valuable resource for researchers, students, professionals and developers in ESS, providing them with accurate and consistent terminology to enhance their work, improve communication and data sharing and advance knowledge in their respective fields.

How to cite: Wolodkin, A., Ganske, A., Kraft, A., Lammer, A., Martens, C., Salhi, I., Stocker, M., Thiemann, H., and Weiland, C.: BITS: BluePrints for the Integration of Terminology Services in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5393, https://doi.org/10.5194/egusphere-egu24-5393, 2024.

EGU24-10364 | Posters on site | ESSI2.8

Streamlining Data Pre-processing and Analysis through the V-FOR-WaTer Web Portal 

Marcus Strobl, Elnaz Azmi, Bischof Balazs, Safa Bouguezzi, Alexander Dolich, Sibylle K. Hassler, Mirko Mälicke, Ashish Manoj J, Jörg Meyer, Achim Streit, and Erwin Zehe

The rapid growth of environmental data and the complexity of data pre-processing tasks poses significant challenges to environmental scientists. Repetitive and error-prone manual data preparation methods not only consume valuable research time but also introduce potential data quality issues. Also, individually pre-processed datasets are hardly reproducible. The V-FOR-WaTer virtual research environment (VRE) addresses these challenges as a powerful tool that seamlessly integrates data access, data pre-processing, and data exploration capabilities.

V-FOR-WaTer has an automated data pre-processing workflow to improve data preparation by eliminating the need for manual data cleaning, standardization, harmonization, and formatting. This approach significantly reduces the risk of human error while freeing up researchers to focus on their actual research questions without being hampered by data preparation. The pre-processing tools integrated in the virtual research environment are designed to handle a wide range of data formats, ensuring consistent and reliable data preparation across diverse disciplines. This empowers researchers to seamlessly integrate data from various sources in a standardized manner.

The web portal's user-centric design facilitates data exploration and selection through map operations and filtering options, empowering researchers to efficiently identify and focus on relevant data for their analyses. The scalability and extensibility of the V-FOR-WaTer web portal ensures that it can accommodate the ever-growing volume of environmental data and adapt to the evolving research landscape. Its ability to integrate user-developed tools reflects the dynamic nature of environmental research and ensures that the virtual research environment stays up-to-date with the latest advancements in data processing. The comprehensive features and user-friendly interface position it as a valuable tool for environmental scientists, fostering collaboration, streamlining data analysis, and accelerating the advancement of knowledge in the field of hydrology.

How to cite: Strobl, M., Azmi, E., Balazs, B., Bouguezzi, S., Dolich, A., Hassler, S. K., Mälicke, M., Manoj J, A., Meyer, J., Streit, A., and Zehe, E.: Streamlining Data Pre-processing and Analysis through the V-FOR-WaTer Web Portal, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10364, https://doi.org/10.5194/egusphere-egu24-10364, 2024.

EGU24-12266 | ECS | Orals | ESSI2.8

The Science Discovery Engine: Connecting Heterogeneous Scientific Data and Information  

Kaylin Bugbee, Ashish Acharya, Emily Foshee, Muthukumaran Ramasubramanian, Carson Davis, Bishwas Praveen, Kartik Nagaraja, Shravan Vishwanathan, Stephanie Wingo, and Rachel Wyatt

Transformative science often occurs at the boundaries of different disciplines. Making interdisciplinary science data, software and documentation discoverable and accessible is essential to enabling transformative science. However, connecting this diverse and heterogeneous information is often a challenge due to several factors including the dispersed and sometimes isolated nature of data and the semantic differences between topical areas. NASA’s Science Discovery Engine (SDE) has developed several approaches to tackling these challenges. The SDE is a unified, insightful search experience that enables discovery of NASA’s open science data across five topical areas: astrophysics, biological and physical sciences, Earth science, heliophysics and planetary science. In this presentation, we will discuss our efforts to develop a systematic scientific curation workflow to integrate diverse content into a single search environment. We will also share lessons learned from our work to create a metadata crosswalk across the five disciplines. 

How to cite: Bugbee, K., Acharya, A., Foshee, E., Ramasubramanian, M., Davis, C., Praveen, B., Nagaraja, K., Vishwanathan, S., Wingo, S., and Wyatt, R.: The Science Discovery Engine: Connecting Heterogeneous Scientific Data and Information , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12266, https://doi.org/10.5194/egusphere-egu24-12266, 2024.

EGU24-12978 | ECS | Posters on site | ESSI2.8

Advancing Data Management: A Novel Digital Ecosystem for FAIR Time Series Data Management in Earth System Sciences 

Joost Hemmen, David Schäfer, Martin Abbrent, Florian Gransee, Tobias Kuhnert, Bert Palm, Maximilian Schaldach, Christian Schulz, Martin Schrön, Thomas Schnicke, and Jan Bumberger

Robust infrastructures for managing and accessing high volume data are an essential foundation for unraveling complex spatiotemporal processes in the earth system sciences. Addressing multifaceted research questions demands data from diverse sources; however, isolated solutions hinder effective collaboration and knowledge advancement.

We present a novel digital ecosystem for FAIR time series data management, deeply rooted in contemporary software engineering and developed at the Helmholtz Centre for Environmental Research (UFZ) in Leipzig, Germany. Designed to flexibly address discipline-specific requirements and workflows, the system emphasizes user-centric accessibility, ensuring the reliability, efficiency, and sustainability of time series data across different domains and scales.

Our time series ecosystem includes a user-centric web-based frontend for (real-time) data flow and metadata management, a versatile data integration layer, a robust time series database, efficient object storage, near real-time quality control, and comprehensive data visualization capabilities. Supporting modern and classical data transfer protocols, the system ensures compliance with OGC standards for data access, facilitating efficient progress in the data lifecycle through high-performance computing. This fully integrated and containerized solution enables swift deployment and seamless integration with existing services.

Illustrating the practical application of the system, we showcase its success in managing Cosmic Ray Neutron Sensing data from the TERENO project. This success story underscores the system's effectiveness in addressing challenges associated with time series data management in earth system sciences, fostering more efficient research and facilitating informed decision-making processes.

This contribution aligns seamlessly with the session's focus on connecting RDIs. We aim to promote transferable approaches, use existing standards, and facilitate collaborations transcending barriers among RDI providers, developers, and researchers. By presenting our experiences and best practices, this presentation invites engagement and discussions to collectively address the challenges in bringing research data infrastructures together.

How to cite: Hemmen, J., Schäfer, D., Abbrent, M., Gransee, F., Kuhnert, T., Palm, B., Schaldach, M., Schulz, C., Schrön, M., Schnicke, T., and Bumberger, J.: Advancing Data Management: A Novel Digital Ecosystem for FAIR Time Series Data Management in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12978, https://doi.org/10.5194/egusphere-egu24-12978, 2024.

EGU24-13159 | ECS | Posters on site | ESSI2.8

MorSnowAI v1.0 : An Open-Source Python Package for Empowering Artificial Intelligence in Snow Hydrology - A Comprehensive Toolbox 

Haytam Elyoussfi, Abdelghani Boudhar, Salwa Belaqziz, Mostafa Bousbaa, Karima Nifa, Bouchra Bargam, Ismail Karaoui, Ayoub Bouihrouchane, Touria Benmira, and Abdelghani Chehbouni

Data-driven methods, such as machine learning (ML) and deep learning (DL), play a pivotal role in advancing the field of snow hydrology. These techniques harness the power of algorithms to analyze and interpret vast datasets, allowing researchers to uncover intricate patterns and relationships within the complex processes of snow dynamics. In snow hydrology, where traditional models may struggle to capture the nonlinear and dynamic nature of snow-related phenomena, data-driven methods provide a valuable alternative. Using data-driven methods (ML and DL) requires advanced skills in various fields, such as programming and hydrological modeling. In response to these challenges, we have developed an open-source Python package named MorSnowAI that streamlines the process of building, training and testing artificial intelligence models based on machine learning and deep learning methods. MorSnowAI not only automates the building, training, and testing of artificial intelligence models but also significantly simplifies the collection of data from various sources and formats, such as reanalyzing datasets (ERA5-Land) from Copernicus Climate Data and remote sensing data from Modis, Landsat, and Sentinel datasets to calculate Normalized Difference Snow Index (NDSI). It can also utilize local datasets as inputs for the model. Among other features available in the MorSnowAI package, it provides pre-processing and post-processing methods that users can choose, along with visualization and analysis of the available time series. The scripts developed in the MorSnowAI package have already undergone evaluation and testing in various snow hydrology applications. For instance, these applications include predicting snow depth, streamflow, snow cover, snow water equivalent, and groundwater levels in mountainous areas of Morocco. The automated processes within MorSnowAI contribute to advancing the field, enabling researchers to focus on refining model inputs, interpreting results, and improving the overall understanding of complex hydrological systems. By bridging the gap between hydrology and advanced data-driven techniques, MorSnowAI fosters advancements in research, offering valuable insights for resource management in regions heavily influenced by snow dynamics. 

How to cite: Elyoussfi, H., Boudhar, A., Belaqziz, S., Bousbaa, M., Nifa, K., Bargam, B., Karaoui, I., Bouihrouchane, A., Benmira, T., and Chehbouni, A.: MorSnowAI v1.0 : An Open-Source Python Package for Empowering Artificial Intelligence in Snow Hydrology - A Comprehensive Toolbox, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13159, https://doi.org/10.5194/egusphere-egu24-13159, 2024.

EGU24-13522 | ECS | Orals | ESSI2.8

Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation   

Nigel Rees, Lesley Wyborn, Rui Yang, Jo Croucher, Hannes Hollmann, Rebecca Farrington, Yue Sun, Yiling Liu, and Ben Evans

The 2030 Geophysics Collections Project was a collaborative effort between the National Computational Infrastructure (NCI), AuScope, Terrestrial Ecosystem Research Network (TERN) and the Australian Research Data Commons (ARDC) that aimed to create a nationally transparent, online geophysics data environment suitable for programmatic access on High Performance Computing (HPC) at the NCI. Key focus areas of this project included the publication of internationally standardised geophysical data on NCI’s Gadi Tier 1 research supercomputer, as well as the development of geophysics and AI-ML related specialised software environments that allow for efficient multi-physics processing, modeling and analysis at scale on HPC systems.

Raw and high-resolution versions of AuScope funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets are now accessible on HPC along with selected higher-level data products. These datasets have been structured to enable horizontal integration, allowing disparate datasets to be accessed in real-time as online web services from other repositories. Additionally, vertical integration has been established for MT data, linking the source field acquired datasets with derivative processed data products at the NCI repository, as well as linking to other derivative data products hosted by external data portals.

To support next-generation geophysical research at scale, these valuable datasets and accompanying metadata need to be captured in machine-readable formats and leverage international standards, vocabularies and identifiers. For MT, automations were developed that generate different MT processing levels at scale in internationally compliant high-performant data and metadata standards. By parallelising these automated processes across HPC clusters, one can rapidly generate different processing levels for entire geophysical surveys in a matter of minutes. 

In parallel with these data enhancements, the NCI-geophysics software environment was developed, which compiled and containerised a wide range of geophysical and data science related packages in Python, Julia and R. In addition, the NCI-AI-ML environment bundled together popular machine learning and data science packages and configured them for HPC GPU architectures. Standalone open source geophysical applications that support parallel computation have also been added to NCI’s Gadi supercomputer. 

The 2030 Geophysics Collections Project has made the first strides towards enabling a new era in Australian geophysical research, opening up the potential for rapid multi-physics geophysical analysis at scale with the computational tools available within the NCI. By establishing and continuing to build on this geophysical infrastructure, the nation will be better equipped to address the various geophysical challenges and opportunities in the decades ahead.

How to cite: Rees, N., Wyborn, L., Yang, R., Croucher, J., Hollmann, H., Farrington, R., Sun, Y., Liu, Y., and Evans, B.: Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13522, https://doi.org/10.5194/egusphere-egu24-13522, 2024.

EGU24-13651 | Posters on site | ESSI2.8

Why and How to Increase Dataset Compression in RDIs and MIPs like CMIP7 

Charles Zender

Research data infrastructures (RDIs) like the Coupled Model Intercomparison Project (CMIP) exemplify geoscientific dataset archive organization and applied informatics. The CMIP metadata and data policies have continuously co-evolved with mature and FAIR technologies (e.g., CF, OpenDAP, ESGF) that are, in turn, often adopted by other RDIs. Improved lossy and lossless compression support in the standard netCDF/HDF5 scientific software stack merit consideration for adoption in upcoming MIPs and RDIs like CMIP7. We have proposed a three point plan to CMIP7 to utilize modern lossy and lossless compression to reduce its storage and power requirements (and associated greenhouse gas emissions). The plan will boost the compression ratio of CMIP-like datasets by a factor of about three relative to CMIP6, preserve all scientifically meaningful data, and retain CF-compliance. We will present the plan, and discuss why and how to implement it in CMIP7 and other MIPs and RDIs.

How to cite: Zender, C.: Why and How to Increase Dataset Compression in RDIs and MIPs like CMIP7, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13651, https://doi.org/10.5194/egusphere-egu24-13651, 2024.

CAESAR (Comprehensive Space Weather Studies for the ASPIS Prototype Realisation) is a project funded by ASI (Italian Space Agency) and INAF (Italian National Institute for Astrophysics) for the development of the prototype of ASPIS (ASI SPace weather InfraStructure). We report here design considerations, challenges and final status of the creation of a database for the ASPIS prototype, which will allow for the study of the chain of phenomena from the Sun to Earth and planetary environments. The database is aimed at handling the heterogeneity of metadata and data while storing and managing the interconnections of various Space Weather events. On top of the database, interfaces for users, including a graphical web interface and an advanced Python module (ASPIS.py), have been developed to facilitate data discovery, access, and analysis. The high-level metadata, to inform the discovery phase in the database, have been collected using an internally developed tool, ProSpecT (Product Specification Template). This tool utilises JSON Schema and JSONForms to create a web interface to guide the data providers in describing their "Products" and generate a JSON object with the necessary metadata. The metadata structure starts from the IVOA VOResource standard, tailored to suit the CAESAR project's requirements. At present, approximately 100 product descriptions in JSON format have been collected and used to create wiki-like documentation pages besides helping in examining formats and metadata details for the implementation of the database prototype. The graphical web interface helps the users discover, filter, and access the database content, while ASPIS.py also provides more advanced analysis tooling. Moreover, ASPIS.py sits on top of commonly used Python packages like SunPy, scikit-learn, matplotlib to help integrate research analysis with other tools and research domains. The database has been built keeping in mind adherence to FAIR principles and with the idea to make it easily interoperable with other research data infrastructures in the Space Weather or sibling research domains.

How to cite: Molinaro, M. and the CAESAR Team: Archive prototype for Space Weather phenomena chains from the Sun to the Earth: CAESAR for ASPIS, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14586, https://doi.org/10.5194/egusphere-egu24-14586, 2024.

EGU24-15031 | Posters on site | ESSI2.8

Implementing a Distributed Processing Framework for Multi-Risk Analysis - A Lessons Learned Perspective 

Nils Brinckmann, Michael Langbein, Benjamin Proß, Arne Vogt, and Elisabeth Schöpfer

Analysing individual hazards and the associated risk is a challenging task in its own right. It requires a lot of expertise built up over dozens of years. Unfortunately, there are situations where that a single hazard can trigger following - often horrific - consequences. The history of international catastrophes is full of examples: The fires in San Francisco after the 1906 earthquake due to destroyed gas pipelines; the tsunami that destroyed the Fukushima nuclear power plant after the Tohoku earthquake, or the climatic effects of the Krakatau eruption in 1883.

In our RIESGOS project we have been working on an demonstrator app to analyse multi-risk-scenarios - with a strong focus on the earthquake-tsunami combination. This is an use case that is very relevant in our partner countries Ecuador, Peru and Chile - and the knowledge is provided here by the partner institutions of the RIESGOS consortium.

The technical approach is strongly focused to be standard based using OGC Web Processing Services, as well as to be distributed. This allows to use the specific expertise of each of the partner institution to be taken into account, to share the involved data & algorithms that have been built up and refined over years.

What we focus in this presentation is a deeper insight into the implementation perspective, with the benefits as well as strategies to overcome challenging
aspects that we encounted when working with the distributed risk analysis framework. These include requirements on interoperability, deployments of bundled versions for testing & transfer, monitoring and others.

How to cite: Brinckmann, N., Langbein, M., Proß, B., Vogt, A., and Schöpfer, E.: Implementing a Distributed Processing Framework for Multi-Risk Analysis - A Lessons Learned Perspective, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15031, https://doi.org/10.5194/egusphere-egu24-15031, 2024.

EGU24-15740 | Orals | ESSI2.8

The European Weather Cloud (EWC) – Collaboration Platform for Meteorological Development from Research to Applications 

Roope Tervo, Joachim Saalmüller, Umberto Modigliani, Vasileios Baousis, Jörg Schulz, Mike Grant, Francesco Murdaca, Xavier Abellan, and Roberto Cuccu

The European Weather Cloud (EWC) is the cloud-based collaboration platform for meteorological application development and operations in Europe and to enable the digital transformation of the European Meteorological Infrastructure. It consists of data-proximate cloud infrastructure established by the EUMETSAT and ECMWF. The EWC is open and partners partners can federate the access to their data or infrastructure assets.

The EWC is available for EUMETSAT and ECMWF Member and Cooperating States and EUMETSAT Satellite Application Facilities (SAFs) covering both research and operational use cases. Resources are also available for research initiatives, undertaken by one or more EUMETSAT or ECMWF Member States, via specific EUMETSAT Research and Developoperament (R&D) calls and ECMWF Special Projects. Currently, EWC hosts 16 R&D calls and Special Projects, lasting 1-3 years.

The EWC focuses very much on the community taking an iterative user needs-based approach in the development. Notably, research projects and operational applications use the very same environment, which smooths the transition from research to operations (R2O). The hosted services will also be augmented with the Software Marketplace, providing EWC users with the ability to easily share and exploit meteorological applications, algorithms, and machine-learning models. The EWC facilitates a Rocket.chat-based discussion platform for users to discuss and work together, promoting in practice the fundamental collaborative nature of this cloud offering.  

EWC hosts over 132 diverse types of use cases containing, for example, data processing, data services, application development, training, EO and weather data image production, post-processing, and experimenting with cloud technologies. To name a few examples in more detail, the FEMDI project, consisting of 11 European meteorological services, develops data services employing EWC for distributing open meteorological data to fulfil the EU Open Data directive requirements. Second, the Norwegian Meteorological Institute (MET) is piloting an infrastructure federation to create water-quality products by locating the processing chain close to the data. Lastly, numerous projects are developing machine-learning-based models in EWC, including e.g. nowcasting, medium-term weather forecasting, and feature detection from climate data.  

The main data holding accessible to the EWC users is the sum of all online data and products available at ECMWF and EUMETSAT. Services to access the data support both pull and push paradigms for long time series and time-critical access respectively. The services are supported by related functions, such as display, reformat, etc., as per applicable policies. The data offering will be augmented over time based on user needs. 

From a technological viewpoint, the initiative offers services that carry the highest benefits from cloud technology taking the users’ needs, use cases, and existing software into account. EWC looks forward to further develop the service from the current infrastructure-as-a-service (IaaS) model toward platform-as-a-service (PaaS). The plan consists of a Kubernetes engine, a high-throughput batch processing engine, function-as-a-service (serverless) capabilities, and several auxiliary services to support collaborative development and operations. 

How to cite: Tervo, R., Saalmüller, J., Modigliani, U., Baousis, V., Schulz, J., Grant, M., Murdaca, F., Abellan, X., and Cuccu, R.: The European Weather Cloud (EWC) – Collaboration Platform for Meteorological Development from Research to Applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15740, https://doi.org/10.5194/egusphere-egu24-15740, 2024.

There are many cutting-edge interdisciplinary scientific problems in the Earth and space sciences, such as solar-terrestrial complex system research and study of celestial bodies.The cross-disciplinary data discovery and access services, data analysis and fusion services are a common need for users of these cutting-edge problems.

This presentation introduces data services practices implemented by the NSSDC as a national-level data centre, in order to enhance findability and accessibility of data for interdisciplinary research and application. Basically, NSSDC has formed an multidisciplinary data resources system, including space astronomy, space physics and space environment, planetary science, space geoscience, etc. In order to share data, NSSDC customizes the data service system for each satellite project and ground-based observation project. To enhance the discoverability, NSSDC developed a data retrieval platform which proving a cross-system, cross-disciplinary, distributed data resource discovery service. Meantime, the data catalogues are synchronized to third-party platforms by harvesting or registration through the data retrieval platform. Besides, multidisciplinary analysis and fusion tools and IT infrastructure will be integrated a research data infrastructure in the field of solar-terrestrial space physics and astronomy.

Especially, NSSDC have established strategic cooperation with other National Science Data Centers in the fields of astronomy and high-energy physics. For the common community of specific cross-cutting scientific problems and applications, NSSDC has engaged in practices such as the co-construction of multi-source data resources, the interconnection of data infrastructures, and the construction of data application ecosystems. Finally, this presentation also will explain the NSSDC’s next practice plan in data new paradigm technology innovation cooperation with more interdisciplinary data centers.

How to cite: Xu, Q., Hu, X., and Zou, Z.: Chinese National Space Science Data Center (NSSDC)’s Data Services Practices for Interdisciplinary Research and Application, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16205, https://doi.org/10.5194/egusphere-egu24-16205, 2024.

EGU24-16749 | Posters on site | ESSI2.8

Reaching the Data Space – Standard data procedures and defining responsibilities for common data elements 

Emanuel Soeding, Andrea Poersch, Yousef Razeghi, Dorothee Kottmeier, and Stanislav Malinovschii

At the Helmholtz Association, we strive to establish a well-formed harmonized data space, connecting information across distributed data infrastructures. This requires standardizing the description of data sets with suitable metadata to achieve interoperability and machine actionability.

One way to make connections between datasets and to avoid redundancy in metadata is the consistent use of Persistent Identifiers (PIDs). PIDs are an integral element of the FAIR principles (Wilkinson et al. 2016) and recommended to refer to data sets. But also to other meta data such as people, organizations, projects, laboratories, repositories, publications, vocabularies, samples, instruments, licenses, and methods should be commonly referenced by PIDs, but not for all of these agreed identifiers exist. Consistently integrating the existing PIDs into data infrastructures can create a high level of interoperability allowing to build connections between data sets from different repositories according to common meta information. In HMC we start this process by implementing PIDs for people (ORCID) and organizations (ROR) in data infrastructures.

Harmonizing PID metadata, however, is only the first step in setting up a data space. Here we shed some light on which strategies we recommend for the implementation within the Helmholtz Association and make suggestions, which stakeholder groups should be included in order to hold them responsible for maintaining them to shape the Helmholtz Data Space. The conclusions from this process do not only affect the implementation of PID metadata, but may also be used for the harmonization of vocabularies, digital objects, interfaces, licenses, quality flags and others, in order to connect our global data systems, to redefine stakeholder responsibility and to ultimately reach the data space.

How to cite: Soeding, E., Poersch, A., Razeghi, Y., Kottmeier, D., and Malinovschii, S.: Reaching the Data Space – Standard data procedures and defining responsibilities for common data elements, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16749, https://doi.org/10.5194/egusphere-egu24-16749, 2024.

EGU24-17279 | Posters on site | ESSI2.8

Towards a user-friendly NFDI4Earth OneStop4All portal to support researchers in Earth System Sciences in Germany 

Ivonne Anders, Peter Braesicke, Auriol Degbelo, Sibylle K. Hassler, Christin Henzen, Ulrike Kleeberg, Marie Ryan, and Hannes Thiemann

The National Research Data Infrastructure (NFDI) aims to create a sustainable and networked infrastructure for research data and helps to overcome the challenges associated with the storage, management and processing, security, and provision of research data in Germany [1]. It thus plays an important role in promoting open science and the exchange of FAIR research data. One of the NFDI initiatives is NFDI4Earth, which focuses on Earth System Sciences (ESS) [2]. Within the many ESS sub-disciplines, there is a diverse range of relevant high-quality data, services, tools, software, data repositories, as well as training and learning materials. Thus, it is not easy for researchers to find these various useful resources. Additionally, there is a lack of knowledge on how to use them due to an enormous diversity of standards, platforms, etc.

The NFDI4Earth OneStop4All addresses these issues by serving as the primary user-friendly access point (Web portal) to the relevant ESS resources. It gives a coherent overview of the (distributed) resources for research data management (RDM), and data analysis/data science that are made available by the members of the NFDI4Earth as well as the Earth System Science (ESS) community. In particular, the OneStop4All provides access to data and software repositories, subject-specific RDM articles and a learning management system for open educational resources relevant to ESS researchers. In addition, it guides users through the NFDI4Earth resources according to their specific ESS RDM and data science needs and capabilities. The OneStop4All also promotes seamless access to a distributed user support network.

The design and development of the OneStop4All is centered on the needs of the users. A good user experience requires an understanding of user behaviour, goals, motivations, and expectations and incorporating this knowledge into every stage of the design process. To achieve this, we use methods from user-centered design (UCD), complemented by knowledge and experience in various ESS disciplines from the members of the NFDI4Earth consortium, their extended scientific networks and by directly involving the community. 

We present the process of developing the user interface concept for the OneStop4All concerning usability and user experience and first insights into the platform are given.

 

References

[1] Agreement between the Federal Government and the Länder concerning the Establishment and Funding of a National Research Data Infrastructure (NDFI) of 26 November 2018: PDF-Datei 

[2] NFDI4Earth Consortium. (2022, Juli 7). NFDI4Earth - National Research Data Infrastructure for Earth System Sciences. Zenodo. https://doi.org/10.5281/zenodo.6806081

 

How to cite: Anders, I., Braesicke, P., Degbelo, A., Hassler, S. K., Henzen, C., Kleeberg, U., Ryan, M., and Thiemann, H.: Towards a user-friendly NFDI4Earth OneStop4All portal to support researchers in Earth System Sciences in Germany, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17279, https://doi.org/10.5194/egusphere-egu24-17279, 2024.

EGU24-17495 | Posters on site | ESSI2.8

The importance of interlinking Research Data Infrastructures and Research Data Management initiatives 

Valentina Protopopova-Kakar, Florian Ott, Kirsten Elger, Melanie Lorenz, and Wolfgang zu Castell

A core element of the National Research Data Infrastructure (NFDI) initiative in Germany is the ambition to harmonize the research data landscape not only on a national level, but to connect to and intertwine with international initiatives in Research Data Management (RDM).  In the context to increase the interoperability between different research data domains, metadata standardization, controlled vocabularies, application programming and the setup of different service interfaces are key areas of interest. As such, the NFDI is the German contributor to the European Open Science Cloud (EOSC), and strives to become a central contact point between German and international stakeholders. To achieve such a harmonized, interoperable and international data landscape, the NFDI Consortium for Earth System Sciences (NFDI4Earth) is open to promote common standards to the national Earth System Science (ESS) community and to support the development of new RDM pathways by connecting and actively participating in international initiatives. NFDI4Earth also strives to foster a cultural change towards increased awareness of FAIR (Findable, Accessible, Interoperable, Reusable) and Open Science principles in Germany. Having a user-friendly technical infrastructure, meaningful services, as well as up-to-date educational resources are all important elements in NFDI4Earth to advance the cultural shift in the ESS research community towards FAIR and open research data management.  Another important part of the cultural change is to acknowledge data and software publications as scientific merit and to recognize those as part of scientific achievements.  

How to cite: Protopopova-Kakar, V., Ott, F., Elger, K., Lorenz, M., and zu Castell, W.: The importance of interlinking Research Data Infrastructures and Research Data Management initiatives, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17495, https://doi.org/10.5194/egusphere-egu24-17495, 2024.

EGU24-18037 | ECS | Posters on site | ESSI2.8

Towards more standards for sample descriptions: The FAIR WISH project 

Alexander Brauser, Kirsten Elger, Linda Baldewein, Simone Frenzel, Ulrike Kleeberg, Birgit Heim, Ben Norden, and Mareike Wieczorek

In many scientific disciplines, physical samples represent the origin of research results. They record unique events in history, support new hypotheses, and are often not reproducible. At the same time, samples are essential for reproducing and verifying research results and deriving new results by analysing existing samples with new methodology. Consequently, the inclusion of sample metadata in the digital data curation processes is an important step to provide the full provenance of research results. The largest challenge is the lack of standardisation and the large variety of sample types and individuals involved: Most samples are collected by individual researchers or small groups that may have internal agreements for sample descriptions, but these might only be used for one expedition or within a small community, and rarely reach beyond institutional boundaries.

The International Generic Sample Number (IGSN, www.igsn.org) is a globally unique, resolving, and persistent identifier (PID) for physical samples with a dedicated metadata schema supporting discovery functionality in the internet. IGSNs allow data and publications to be linked directly to the samples from which they originate and provide contextual information about a particular sample on the internet.

The aim of the project FAIR WISH (FAIR Workflows to establish IGSN for Samples in the Helmholtz Association), funded by the Helmholtz Metadata Collaboration (HMC) was to work towards more standardisation of rich sample descriptions. Project outcomes include (i) standardised, rich and discipline-specific IGSN metadata schemes for different physical sample types within the Earth and Environmental sciences (EaE), (ii) workflows to generate machine-readable IGSN metadata from different states of digitisation and (iii) the FAIR Samples Template.

The FAIR SAMPLES Template enables metadata collection and batch upload of samples at various sample hierarchies (parent, children at different hierarchy levels) at once. The ability to fill the FAIR SAMPLES Template by individual researchers or research teams or to create scripts to fill it out directly from databases for a wide range of sample types makes the template flexible with a wide applicability. The structured metadata, captured with the FAIR SAMPLES Template and converted into XML files, already represents an important step for the standardisation of rich sample descriptions and their provision in machine-readable form.

Standardised workflows for metadata documentation and compliance with international metadata standards address the challenges associated with reproducibility of samples and their insufficient documentation. The developments within the FAIR WISH project provide a foundation for a more collaborative and integrated scientific enterprise. Future efforts in this area can build on this framework to further improve the accessibility and interoperability of sample data and advance the collective understanding of Earth's environmental processes.

How to cite: Brauser, A., Elger, K., Baldewein, L., Frenzel, S., Kleeberg, U., Heim, B., Norden, B., and Wieczorek, M.: Towards more standards for sample descriptions: The FAIR WISH project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18037, https://doi.org/10.5194/egusphere-egu24-18037, 2024.

EGU24-18056 | Posters on site | ESSI2.8

Executable Book for the IPCC AR6 ATLAS products 

Antonio S. Cofiño and David Dominguez Roman

Internationally-coordinated climate model intercomparison projects (MIPs) explore the uncertainties inherent to climate change science. The Multi-MIP Climate Change ATLAS repository [1] is the backbone of the Sixth IPCC Assessment Report (AR6) Atlas Chapter, which provides a region-by-region assessment of climate change including also the innovative Interactive Atlas [2]. The Interactive Atlas complements the report by providing flexible spatial and temporal analyses of regional climate change, based on different MIPs.

The IPCC AR6 promotes best practises in traceability and reproducibility of the results shown in the report, including the adoption of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles for scientific data. In particular, reproducibility and reusability are central in order to ensure the transparency of the final products. The ATLAS products are generated using free software community tools, based on the climate4R framework [3], for data post-processing (data access, regridding, aggregation, bias adjustment, etc.), evaluation and quality control (when applicable). All the ATLAS code is made publicly available as notebooks and scripts [1].

The Executable Book Project (EBP) [4] is an international collaboration between several universities and open source projects, to build tools that facilitate computational narratives (books, lectures, articles, etc …) using open source tools allowing users from scientific and academic communities to be able to: merge rich text content, output from live code, references, cross-references, equations, images, etc; execute content and cache results; combine into a document model, cached outputs and content files; build interactive (i.e. HTML) and publication-quality (PDF) outputs; and control everything from a simple interface. 

In this contribution, a demonstration of a computational book has been created using the JupyterBook ecosystem, binding the code scripts and the notebooks from the Multi-Model Intercomparison Project (Multi-MIP) Climate Change Atlas repository to improve its reproducibility and reusability. 

Acknowledgement: This work is partly supported by: project CORDyS (PID2020-116595RB-I00) funded by MCIN/AEI/10.13039/501100011033; Ministry for the Ecological Transition and the Demographic Challenge (MITECO) and the European Commission NextGenerationEU (Regulation EU 2020/2094), through CSIC's Interdisciplinary Thematic Platform Clima (PTI-Clima); and, the ENES-RI and IS-ENES3 project which is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824084 

[1] https://github.com/SantanderMetGroup/ATLAS
[2] http://interactive-atlas.ipcc.ch
[3] https://github.com/SantanderMetGroup/climate4R
[4] https://executablebooks.org

How to cite: Cofiño, A. S. and Dominguez Roman, D.: Executable Book for the IPCC AR6 ATLAS products, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18056, https://doi.org/10.5194/egusphere-egu24-18056, 2024.

EGU24-18182 | Posters on site | ESSI2.8

Case study Climate Change : How Earth System Science benefits from FAIRCORE4EOSC components 

Anna-Lena Flügel, Beate Krüss, Heinrich Widmann, Hannes Thiemann, Stephan Kindermann, and Fanny Adloff

Climate Change is one of the most pressing global challenges in which researchers from around the world and from various disciplines are working together. Due to the demands regarding the use of openly accessible data in their own as well as other research domains, providing services embedded in European and international infrastructures has always been crucial for climate model researchers.Therefore the Horizon Europe project FAIRCORE4EOSC established the case study “Climate Change” to demonstrate how researchers, the Earth System Science (ESS) community and wider user communities can benefit from the components developed in FAIRCORE4EOSC.

Within FAIRCORE4EOSC, the German Climate Computing Center (DKRZ) examines the possibility of integrating EOSC (European Open Science Cloud) and IS-ENES (Infrastructure for the European Network for Earth System Modelling) services within the Climate Change case study to address some of the data challenges of the ESS community. For example, a huge data space exists in ENES which cannot be found in EOSC, neither on fine-granular nor coarse-granular level. For some ENES data collections, DataCite DOIs are assigned, but these usually refer to thousands of data objects that need to be grouped into different levels of aggregation for which no PIDs are currently available. Additionally, data still miss context formed by producers, experiments, projects, devices, etc crucial for interdisciplinary re-use as well as metadata crosswalks.

To address these challenges, the Climate Change case study investigates the benefits of four FAIRCORE4EOSC components: RAiD (Research Activity Identifier Service), PIDGraph, DTR (Data Type Registry) and MSCR (Metadata Schema and Crosswalk Registry). The goal is to improve discoverability and reusability of data collections at all levels of granularity, and to link data to experiments and projects. 

In this case study, selected ENES data collections will receive identifiers using Kernel Information Types developed in FAIRCORE4EOSC as well as the DTR contents. The assignment of RAiDs to projects/experiments provides domain agnostic users with an aggregated view of the entities (data, software, people involved, etc.) from data generation by the Earth System modellers up to publication of final assessment reports by IPCC authors. These metadata will be supplied to Open Science Graphs and represented within the PIDGraph that visualises the context and interlinking for a specific research project based on DOIs and RAiDs. In addition to the identifiers, the scientific metadata are also made available. Improving information that enables meaningful crosswalks is important and supported by the features of the DTR and MSCR. The DTR offers the possibility to register and assign a PID to a data type (e.g. measurement unit, info type, schema) and ensures a machine actionable standardisation of PID metadata for data objects. The Climate Change case study will use DTRs for persistent Climate Forecast convention (CF) variable definitions. The MSCR can then be used to create machine actionable unit conversions or variable mappings based on DTR data types. This focus on improving the prerequisites for machine-aided analytics including semantic aspects is of high priority due to the commonly large data volumes and the high interdisciplinary requirements in climate science.

How to cite: Flügel, A.-L., Krüss, B., Widmann, H., Thiemann, H., Kindermann, S., and Adloff, F.: Case study Climate Change : How Earth System Science benefits from FAIRCORE4EOSC components, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18182, https://doi.org/10.5194/egusphere-egu24-18182, 2024.

EGU24-18193 | ECS | Posters on site | ESSI2.8

Unified Vocabularies for Geo- and Cosmochemical Data Systems 

Leander Kallas, Marthe Klöcking, Lucia Profeta, Stephen Richard, Annika Johansson, Kerstin Lehnert, Manja Luzi-Helbing, Bärbel Sarbas, Hannah Sweets, Dieter Garbe-Schönberg, Matthias Willbold, and Gerhard Wörner

Global compilations of geo- and cosmochemical data are increasingly leveraged to address exciting new research questions through data-analytics and machine-learning approaches. These invaluable datasets are maintained and made accessible as synthesis databases, such as GEOROC and PetDB catering to terrestrial igneous and metamorphic rocks; AstroMat Data Synthesis encompassing diverse astromaterial samples; and GeoReM a comprehensive resource for geochemical, environmental and biological reference materials. The GEOROC and PetDB databases for igneous and metamorphic rocks collectively aggregate data from thousands of publications, combining over 42 million single data values (major and trace elements, stable and radiogenic isotope ratios, radiometric ages) for bulk rock, glass, as well as minerals and their inclusions.

The diverse focus of these data systems include data from different sources and metadata makes data integration and interoperability challenging. The DIGIS and EarthChem projects are working towards designing machine-readable unified vocabularies for their data systems to achieve full interoperability. These vocabularies, associated with primary chemical data as well as geospatial, analytical and sample metadata, encompass many categories describing geographic location, sampling technique, lithology and mineral types, geological and tectonic setting, as well as analytes, analytical methods, reference materials, and more.

Wherever possible, external machine- and/or human-readable external vocabularies from respected authorities are incorporated, such as MinDat’s "Subdivisions of Rock," the International Mineralogical Association’s "List of Minerals" (Warr, 2021), and the International Union of Pure and Applied Chemistry’s chemical terminologies. For remaining categories, a set of local vocabularies are developed by our group (e.g. analytical methods, see Richard et al. 2023). The collaborative effort between DIGIS, EarthChem, and the Astromaterials Data System is leading to an advanced vocabulary ecosystem relating samples, data, and analytical methods in geo- and cosmochemical research that reaches from local- to community-driven and, eventually global connections.

Establishing a globally accepted vocabulary not only contributes to building interoperability between our existing geo-and cosmochemistry synthesis databases, but will also help pave the way toward interoperability with the GeoReM database, linking data with analytical methods and reference materials to provide means for data quality control and assessment of analytical uncertainty.

Finally, the unified vocabularies of EarthChem, GEOROC, and GeoReM will advance the creation of a global network of geochemical data systems as promoted by the OneGeochemistry initiative (Klöcking et al., 2023; Prent et al. 2022), connecting and integrating the broadest range of geoanalytical data generated, for example, in studies of environmental samples, archeological artefacts, or geohealth matters.

We report on these goals, achievements, state of advance, and challenges and seek community engagement and feedback.

 

References

Klöcking, M. et al. (2023). Community recommendations for geochemical data, services and analytical capabilities in the 21st century. In Geochimica et Cosmochimica Acta (Vol. 351, pp. 192–205).

Prent, A. et al. (2023) Innovating and Networking Global Geochemical Data Resources Through OneGeochemistry. Elements 19, Issue 3, pp. 136–137.

Richard, S. et al. (2023) Analytical Methods for Geochemistry and Cosmochemistry. Concept Scheme for Analysis Methods in Geo- and Cosmochemistry. Research Vocabularies Australia.

Warr, L. N. (2021). IMA–CNMNC approved mineral symbols. Mineralogical Magazine, 85(3), 291-320.

How to cite: Kallas, L., Klöcking, M., Profeta, L., Richard, S., Johansson, A., Lehnert, K., Luzi-Helbing, M., Sarbas, B., Sweets, H., Garbe-Schönberg, D., Willbold, M., and Wörner, G.: Unified Vocabularies for Geo- and Cosmochemical Data Systems, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18193, https://doi.org/10.5194/egusphere-egu24-18193, 2024.

EGU24-18751 | Posters on site | ESSI2.8

Benefits of Ontologies in Earth System Science 

Andrea Lammert, Claudia Martens, and Aenne Loehden

The exponential growth of data due to technological developments along with an increased recognition of research data as relevant research output during the last decades substantiates fundamental challenges in terms of interoperability, reproducibility and reuse of scientific information. Being cross-disciplinary at its core, research in Earth System Science comprises divergent domains such as Paleontology, Marine Science, Atmospheric Sciences and Molecular Biology in addition to different types of data such as observation and simulation data. Within the various disciplines, distinct methods and terms for indexing, cataloguing, describing and finding scientific data have been developed, resulting in several controlled Vocabularies, Taxonomies and Thesauri. However, given the semantic heterogeneity across scientific domains, effective utilisation and (re)use of data is impeded while the importance of enhanced and improved interoperability across research areas will increase even further, considering the global impact of Climate Change to literally all aspects of everyday life. There is thus a clear need to harmonise practices around the development and usage of semantics in representing and describing information and knowledge.

Using Ontologies (as a formal mechanism for defining terms and their relations) can help to address this issue, especially with regard to discovery, comprehension and metadata enrichment. If used and maintained, Ontologies also encourage metadata standardisation, idealiter across Disciplines. Examples for enhanced search options include (but are not limited to): term relations for variables as well as for topics and locations; Synonyms and Homonyms; autocomplete function for search terms; support of multiple languages. Indexing of research data can be improved using Ontologies e.g. by proposing terms for variable names or measurement units. Depending on their richness, ontologies ease e.g. finding, comprehension, processing, and reuse, both for human users as well as for automatic reasoning and processing.

Ontologies can represent different levels of granularity, connecting domain specific Vocabularies as e.g. Climate Forecast conventions with generic Taxonomies for e.g. Scientific Disciplines or Funding Policies, thus extending the reach of scientific data to other user groups such as Journalists, Politicians or Citizens.

For a beneficial usage of semantic artefacts, sustainability is the key: any kind of terminology service must be maintained to guarantee that terms and relations are offered in a persistent way. But if they are, Vocabularies, Taxonomies, Thesauri and Ontologies can serve as a driving force for improved visibility and findability of research output within and across different research areas. Why Ontologies matter, what they are, and how they can be used will be depicted on our Poster in an easy-to-understand way.

How to cite: Lammert, A., Martens, C., and Loehden, A.: Benefits of Ontologies in Earth System Science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18751, https://doi.org/10.5194/egusphere-egu24-18751, 2024.

EGU24-18756 | Orals | ESSI2.8

A Preoperational Climate Service Information System: Addressing Technical Challenges and Enhancing User Engagement 

Benedikt Gräler, Johannes Schnell, Katharina Demmich, Yagmur Yildiz, Merel Vogel, Julia Kraatz, Stefano Bagli, and Paolo Mazzoli

Given the global scope of the current climate crisis, it is important that it be addressed in all sectors of society. From the increased risk of extreme weather events, to the heightened variability in climate patterns, data and knowledge sharing among both citizens and scientists alike is necessary for the planning of a sustainable future. Thus, the I-CISK project aims to create a human-centered, co-designed, co-created, co-implemented, and co-evaluated climate service (CS), which allows citizens, stakeholders, and decision-makers to take climate-informed decisions into their own hands. 

With helpful insight and discussions with I-CISK partners, and input from the seven Living Labs (LL), in the project’s current stage, the first preoperational CSs have been developed. User-stories which were derived from these discussions aided in the creation of the preoperational CSs, therefore ensuring that the data and information being displayed were tailored to the needs of end-users. 

One key challenge faced during the development of the CSs was presenting weather and climate variables in a way that could be easily-understood by end-users, while simultaneously addressing the questions posed by different stakeholders. Within this challenge, scale raised a significant issue; often-times users preferred to have data visualized on a local scale, however most forecast data was only available at a larger scale. This meant that forecast data had to be spatially corrected to fit this requirement. Another issue faced during development was to provide visualizations that enabled end-users to readily understand uncertainty forecasts; since forecasts for future weather patterns are calculated using different climate models, this means that there is a level of uncertainty when comparing various forecasts. Thus, there is not a single “truth”, and it was imperative that this be made clear when creating the preoperational CSs. To achieve this end, functional and sketch-based mock-ups were designed and discussed with end-users, and within the consortium. Then, based on feedback, they were iteratively further developed. 

Alongside the challenge of how to clearly visualize climate information, another key challenge was finding the most robust and relevant data sources to serve local information needs. We found that to meet data requirements, this meant not only gathering forecast data, but also observed historical data. With these data both displayed in the preoperational CSs, users were therefore able to compare past and future weather patterns with their own personal experience. This further helped users to understand the information that was being relayed in the CSs, and boosted their ability to assess climate predictions. 

In this presentation, we present the general approach of co-designing the preoperational CSs, and what we derived from it. We will also present the technical set-up to integrate the various data sources, the Docker-based semi-automated concept to deploy the individual CS applications in the cloud, and finally, next steps to engage users in current functional CS mock-ups. This work highlights the importance of creating CSs with a human-centered approach, and demonstrates how it has been done within the I-CISK project framework.

How to cite: Gräler, B., Schnell, J., Demmich, K., Yildiz, Y., Vogel, M., Kraatz, J., Bagli, S., and Mazzoli, P.: A Preoperational Climate Service Information System: Addressing Technical Challenges and Enhancing User Engagement, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18756, https://doi.org/10.5194/egusphere-egu24-18756, 2024.

EGU24-19436 | ECS | Posters on site | ESSI2.8

The role of research data repositories for large integrative research infrastructures 

Florian Ott, Kirsten Elger, Simone Frenzel, Alexander Brauser, and Melanie Lorenz

The ongoing digitalisation together with new methods for inter- and transdisciplinary research (e.g., AI, ML) triggered the development of large research infrastructures across the Earth and environmental sciences (e.g. EPOS, EnvriFAIR or the German NFDI), and to increasing demands for seamless data integration and visualisation that requires interoperability of data formats and the used of agreed metadata standards. Especially for data intensive disciplines in geophysics and geodesy, metadata standards are important and already in place and widely adopted (e.g. RineX/SineX formats for GNSS data and GeodesyML metadata for GNSS stations; mSEED format and FDSN metadata recommendations for seismological data). In addition, it becomes increasingly relevant to connect research outputs (papers, data, software, samples) with each other and with the originating researchers and institutions – in unique and machine-readable way. The use of persistent identifier (like DOI, ORCID, ROR, IGSN) and descriptive linked data vocabularies/ontologies in the metadata associated with research outcomes are strongly supporting these tasks.

In this presentation, we will elaborate the role and potential of domain-specific research data repositories for the process described above. Domain repositories are digital archives that manage and preserve curated research data (and/or software, sample descriptions) from specific scientific disciplines. The metadata associated with the DOI-referenced objects is specific for their domain and richer than generic metadata supposed to describe data across many scientific disciplines. They often offer data curation by domain researchers and data specialists and make sure that relevant persistent identifiers are included in the standardised XML or JSON metadata for data discovery that is complementing the disciplinary metadata described above.

Our example is GFZ Data Services, the domain repository for geosciences data, hosted at the GFZ German Research Centre for Geosciences. GFZ Data Services has several partnerships with large research international infrastructures, like EPOS, GEOROC, the World Heat Flow Database Project, and provides data publication services to several geodetic data services of the International Association for Geodesy (ICGEM, IGETS, ISG). Our examples clearly delineate the and the roles of each partner and the benefit of the partnership for the overarching task Open Science.

How to cite: Ott, F., Elger, K., Frenzel, S., Brauser, A., and Lorenz, M.: The role of research data repositories for large integrative research infrastructures, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19436, https://doi.org/10.5194/egusphere-egu24-19436, 2024.

EGU24-19677 | Posters on site | ESSI2.8

A cascaded framework for unified access to and analysis of kilometer scale global simulations across a federation of data centers 

Kameswarrao Modali, Karsten Peters-von Gehlen, Florian Ziemen, Rajveer Saini, Simon Grasse, and Martin Schultz

As the High Performance Computing (HPC) marches into the exascale era, earth system models have transformed into a numerical regime wherein simulations with a 1 km spatial resolution on a global scale are a reality and are currently being performed at various HPC centers across the globe. In this contribution, we provide an overview of the strategy  and plans to adapt the data handling services and workflows available at the German Climate Computing Center (DKRZ) and the Jülich Supercomputing Center (JSC) to enable efficient data access, processing and sharing of output from such simulations using current and next generation Earth System Models. These activities are carried out in the framework of projects funded on an EU as well as national level, such as NextGEMS, WarmWorld and EERIE. 

With the increase in spatial resolution comes the inevitable jump in the volume of the output data. In particular, the throughput due to the enhanced computing power always surpasses the capacity of single-tier storage systems made up of homogeneous hardware  and necessitates multi-tier storage systems consisting of heterogeneous hardware . As a consequence, new issues arise for an efficient, user-friendly data management within each site. Sharing of model outputs that may be produced at different data centers and stored across different multi-tier storage systems poses additional challenges, both in terms of technical aspects (efficient data handling, data formats, reduction of unnecessary transfers) and semantic aspects (data discovery and selection across sites). Furthermore, there is an increasing need for scientifically operational solutions, which requires the development of long-term strategies that can be sustained within the different data centers. To achieve all of this, existing workflows need to be analyzed and largely rewritten. On the upside, this will allow the introduction of new concepts and technologies, for example using the recent zarr file format instead of the more traditional netCDF format.

More specifically, in WarmWorld, the strategy is to create an overarching user interface, to enable the discovery of the federated data, and implement the backend infrastructure for handling the movement of the data, across the storage tiers (SSD, HDD, tape, cloud), within as well as across the HPC centers, as necessitated by the analytical tasks. This approach will also leverage the benefits of community efforts  in redesigning the way km-scale models provide their output, i.e. on hierarchical grids and in relatively small chunks.

We present specific ongoing work to implement this data handling strategy across HPC centers and outline the vision for the handling of high-volume climate model simulation output in the exascale era to enable the efficient analysis of the information content from these simulations. 

How to cite: Modali, K., Peters-von Gehlen, K., Ziemen, F., Saini, R., Grasse, S., and Schultz, M.: A cascaded framework for unified access to and analysis of kilometer scale global simulations across a federation of data centers, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19677, https://doi.org/10.5194/egusphere-egu24-19677, 2024.

EGU24-19849 | ECS | Posters on site | ESSI2.8

Geoluminate: A community-centric framework for the creation, deployment and ongoing development of decentralized geoscience data portals 

Samuel Jennings, Kirsten Elger, Sven Fuchs, Florian Neumann, Ben Norden, Simone Frenzel, Stephan Maes, and Nikolas Ott

An increasing pressure from governing bodies and funding agencies to disseminate research data in an open and FAIR (Findable, Accessible, Interoperable, and Reusable) format has led to an increase in online research portals of varying quality. The task of constructing and maintaining such portals is challenging, especially when left to individuals with limited understanding of modern web architecture. For those starting out on this endeavour, an over-abundance of online advice, coupled with the rapid evolution of “latest technologies”, can be overwhelming. The inevitable uncertainty leads to technologically-isolated portals with limited interoperability that ultimately hinders the exchange of geoscientific information.

To reduce uncertainty for new initiatives, Geoluminate (https://geoluminate.github.io/geoluminate/) – a new micro web framework – offers a simple but robust platform for the rapid creation and deployment of new geoscience research portals. The framework's simplicity ensures that even those with limited expertise in web development can create and maintain effective portals that exhibit consistency in both design and functionality. Geoluminate aims to foster interoperability, reliability and decentralization of geoscience portals by providing a consistent and stable foundation on which they are built.

Leveraging existing features of the Python-based Django Web Framework, Geoluminate offers a comfortable learning curve for those already familiar with Python programming. On top of the feature-rich ecosystem of Django, Geoluminate offers additional features specifically tailored to the needs of geoscientific research portals. Geoluminate is highly-opinionated and comes “batteries included” so that, as a research community, the focus can remain on designing data models that fit specific community needs and less on tedious implementation details.

Currently backed by the international geothermal community as part of the World Heat Flow Database Project (http://heatflow.world/project), Geoluminate is under active development at the GFZ German Research Centre for Geosciences in Potsdam. Under the guidance of the partner repository GFZ Data Services, all data models are intrinsically tied to existing standards of metadata collection (e.g. Datacite, IGSN, ROR, ORCID) such that data publishing is easily facilitated through established pathways.

Geoluminate champions the principles of open science and collaborative knowledge dissemination. This poster presentation aims to showcase the practical implementation and benefits of Geoluminate in creating geoscience research portals that align with FAIR data principles. By fostering a community-centric approach, Geoluminate contributes to the democratization of data management, enabling researchers to actively shape and enhance the landscape of those same portals they likely utilize in their own research.

How to cite: Jennings, S., Elger, K., Fuchs, S., Neumann, F., Norden, B., Frenzel, S., Maes, S., and Ott, N.: Geoluminate: A community-centric framework for the creation, deployment and ongoing development of decentralized geoscience data portals, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19849, https://doi.org/10.5194/egusphere-egu24-19849, 2024.

EGU24-20127 | Orals | ESSI2.8

An interlinked research data infrastructure for time-series data from the Helmholtz Research Field Earth & Environment 

Christof Lorenz, Benjamin Louisot, Sabine Barthlott, Benjamin Ertl, Linda Baldewein, Ulrike Kleeberg, Marie Ryan, Nils Brinckmann, Marc Hanisch, Roland Koppe, Marc Adolf, Claas Faber, Andreas Lehmann, David Schäfer, Ralf Kunkel, Ulrich Loup, Jürgen Sorg, and Hylke van der Schaaf

Time-series data are crucial sources of reference information in all environmental sciences. And beyond typical research applications, the consistent and timely publication of such data is increasingly important for monitoring and issuing warnings, especially in times of growing frequencies of climatic extreme events. In this context, the seven Centres from the Helmholtz Research Field Earth and Environment (E&E) operate some of the largest environmental measurement-infrastructures worldwide. These infrastructures range from terrestrial observation systems in the TERENO observatories and ship-borne sensors to airborne and space-based systems, such as those integrated into the IAGOS infrastructures.

In order to streamline and standardize the usage of the huge amount of data from these infrastructures, the seven Centres have jointly initiated the STAMPLATE project. This initiantive aims to adopt the Open Geospatial Consortium (OGC) SensorThings API (STA) as a consistent and modern interface tailored for time-series data. We evaluate STA for representative use-cases from environmental sciences and enhance the core data model with additional crucial metadata such as data quality, data provenance and extended sensor metadata. After centre-wide implementation, the standardized STA interface also serves community-based tools, e.g., for data visualization, data access, quality assurance/quality control (QA/QC), or the management of monitoring systems. By connecting the different STA endpoints of the participating research Centres, we establish an interlinked research data infrastructure (RDI) and a digital ecosystem around the OGC SensorThings API tailored towards environmental time-series data.

In this presentation, we want to show the status of the project and give an overview of the current data inventory as well as linked tools and services. We will further demonstrate the practical application of our STA-based framework with simple and representative showcases. With our contribution, we want to promote STA for similar applications and communities beyond our research field. Ultimately, our goal is to provide an important building block towards fostering a more open, FAIR (Findable, Accessible, Interoperable, and Reusable), and harmonized research data landscape in the field of environmental sciences.

How to cite: Lorenz, C., Louisot, B., Barthlott, S., Ertl, B., Baldewein, L., Kleeberg, U., Ryan, M., Brinckmann, N., Hanisch, M., Koppe, R., Adolf, M., Faber, C., Lehmann, A., Schäfer, D., Kunkel, R., Loup, U., Sorg, J., and van der Schaaf, H.: An interlinked research data infrastructure for time-series data from the Helmholtz Research Field Earth & Environment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20127, https://doi.org/10.5194/egusphere-egu24-20127, 2024.

EGU24-20238 | ECS | Posters on site | ESSI2.8

When Metadata crosses Borders - Enabling Repository Providers with Joint Forces in Earth System Sciences 

Christin Henzen, Auriol Degbelo, Jonas Grieb, Robin Heß, Ralf Klammer, Roland Koppe, Christof Lorenz, and Claudia Müller

Research data in the Earth System Sciences (ESS) are managed in diverse repositories with varying aims, publishing, and curation approaches, as well as technical solutions. The resulting heterogeneity often hampers implementing interoperability and harvesting concepts. From the researchers' perspective on integrative data-driven research questions across repository borders, this leads to ineffective search and reuse of the data. We consider it is vital to train researchers to provide high-quality FAIR data and metadata. However, it is even more important to enable repository providers to act as multipliers, as this enables them to provide researchers with suitable repository solutions. This can be done, for example, by implementing fit-for-purpose metadata schemas and interfaces.

In Germany, several initiatives serve as umbrellas for joint activities with ESS repository providers. In collaboration of the German national data infrastructure for Earth System Sciences (NFDI4Earth) and the Helmholtz Metadata Collaboration (HMC), we have developed a roadmap that enables repository providers to meet the needs of researchers and technical requirements. 

As an initial step, we developed recommendations in a community-driven process across NFDI4Earth and HMC. These recommendations provide common steps to foster interoperability, particularly with regard to search and harvesting. Moreover, we have identified a first set of use cases for specific types of ESS data that complement the developed recommendations, e.g. underway measurements of seawater temperature. Through regular updates in the form of community consultations and workshops, we will identify further community needs, as well as support updates and developments of metadata standards, e.g. the implementation of underway measurements in GeoDCAT. In this contribution, we will describe our recommendations, use cases, and lessons learned from the activities for a community-driven process to enable repository providers.

How to cite: Henzen, C., Degbelo, A., Grieb, J., Heß, R., Klammer, R., Koppe, R., Lorenz, C., and Müller, C.: When Metadata crosses Borders - Enabling Repository Providers with Joint Forces in Earth System Sciences, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20238, https://doi.org/10.5194/egusphere-egu24-20238, 2024.

EGU24-20387 | Posters on site | ESSI2.8

Towards Transparent Presentation of FAIR-enabling Data Repository Functions & Characteristics 

Robert Huber, Alejandra Gonzalez Beltran, Charlotte Neidiger, Robert Ulrich, and Hervé L’Hours

Identifying, finding and gaining a sufficient overview of the functions and characteristics of data repositories and their catalogues is essential for users of data repositories and catalogues in the environmental and geosciences, as well as in other domains. However, achieving this is not trivial within a reasonable amount of time and effort, especially for less experienced users.  This lack of  transparent, human- and machine-friendly exposure of essential data repository information impacts many possible stakeholders that need up to date and reliable information about data repositories to serve a broad range of users. These include, for example, search engines and registries such as GEOSS, re3data or FAIRsharing.  Researchers need to be able to find FAIR enabling trustworthy repositories to deposit, curate and preserve their own digital objects, as well as  to reliably find FAIR data already gathered by others in order to reuse it. Assessment bodies such as CoreTrustSeal need transparent access to data repositories’ functions and characteristics in order to facilitate their certification process. An  overview of the data and metadata standards, exchange services and interfaces offered by repositories is essential to data scientists in order to effectively integrate these into their workflows. 

In this study we present how seemingly self-evident information about how the identity, purpose ('this is a data repository'), mandate and areas of responsibility of data repositories is exposed to humans and machines via websites and/or catalogues. Our findings are that  such information is difficult to find and in  many cases, machine-readable metadata is not clear, not relevant or missing altogether. We also show that despite all the efforts and successes in developing  discipline specific standards over the last decades, these are insufficiently linked to from more domain agnostic standards. This absence of domain specific information in PID systems and search engines makes it to large extent invisible in the FAIR ecosystem. In particular, relevant metadata representations or links to discipline specific, standardised services, such as the Open Geospatial Consortium (OGC) suite of services, are rarely exposed.

In this paper, we seek to present the simple and effective methods being elaborated within the FAIR-IMPACT project to improve this situation by using existing and emerging methods and standards. To this end, we will show effective ways that repositories can expose services information and standards via typed-link-based sign-posting as currently summarised in the FAIRiCAT approach. We will evaluate the options for implementation across  domain-specific metadata as well as domain-independent formats such as DCAT or schema.org and show how they can be used in combination with FAIRiCAT in practice. We will also present methods for exposing the FAIR status of digital objects and the FAIR-enabling and trustworthiness status of  data repositories to improve cooperation and information exchange between data repositories, registries, assessment providers and certification authorities.

How to cite: Huber, R., Gonzalez Beltran, A., Neidiger, C., Ulrich, R., and L’Hours, H.: Towards Transparent Presentation of FAIR-enabling Data Repository Functions & Characteristics, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20387, https://doi.org/10.5194/egusphere-egu24-20387, 2024.

EGU24-20760 | Orals | ESSI2.8

Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP) 

Clément Albinet, Aimee Barciauskas, Kathleen Baynes, George W. Chang, Brian M. Freitag, Laura Innice Duncanson, Gerald F. Guala, Hua Hook, Neha Hunka, Henri Laur, Marco Lavalle, Cristiano Lopes, Alex Mandel, David F. Moroni, Tamara Queune, Sujen Shah, and Nathan Marc Thomas

The scientific community is faced with a need for greatly improved data sharing, analysis, visualization and advanced collaboration based firmly on open science principles. Recent and upcoming launches of new satellite missions with more complex and voluminous data, as well as the ever more urgent need to better understand the global carbon budget and related ecological processes provided the immediate rationale for the ESA-NASA Multi-mission Algorithm and Analysis Platform (MAAP).

This highly collaborative joint project of ESA and NASA established a framework between ESA and NASA to share data, science algorithms and compute resources in order to foster and accelerate scientific research conducted by ESA and NASA EO data users. Presented to the public in October 2021 [1], the current version of MAAP provides a common cloud-based platform with computing capabilities co-located with the data, a collaborative coding and analysis environment, and a set of interoperable tools and algorithms developed to support, for example, the estimation and visualization of global above-ground biomass.

Data from the Global Ecosystem Dynamics Investigation (GEDI) mission on the International Space Station [2] and the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) [3] have been instrumental pioneer products on MAAP, generating the first comprehensive map of Boreal above-ground biomass [4] and supporting the CEOS Biomass Harmonization Activity [5]. Crucially, the platform is being specifically designed to support the forthcoming ESA Biomass mission [6] and incorporate data from the upcoming NASA-ISRO SAR (NISAR) mission [7]. While these missions and the corresponding research leading up to launch, which includes airborne, field, and calibration/validation data collection and analyses, provide a wealth of information relating to global biomass, they also present data storing, processing and sharing challenges; the NISAR mission alone will produce around 40 petabytes of data per year, presenting a challenge that, without MAAP, would impose several accessibility limits on the scientific community and impact scientific progress.

Other challenges being addressed by MAAP include: 1) Enabling researchers to easily discover, process, visualize and analyze large volumes of data from both agencies; 2) Providing a wide variety of data in the same coordinate reference frame to enable comparison, analysis, data evaluation, and data generation; 3) Providing a version-controlled science algorithm development environment that supports tools, co-located data and processing resources; 4) Addressing intellectual property and sharing challenges related to collaborative algorithm development and sharing of data and algorithms.

 

REFERENCES

[1] https://www.nasa.gov/feature/nasa-esa-partnership-releases-platform-for-open-source-science-in-the-cloud

[2] https://science.nasa.gov/missions/gedi

[3] https://icesat-2.gsfc.nasa.gov/

[4] https://daac.ornl.gov/ABOVE/guides/Boreal_AGB_Density_ICESat2.html            

[5] https://iopscience.iop.org/article/10.1088/1748-9326/ad0b60

[6] T. Le Toan, S. Quegan, M. Davidson, H. Balzter, P. Paillou, K. Papathanassiou, S. Plummer, F. Rocca, S. Saatchi, H. Shugart and L. Ulander, “The BIOMASS Mission: Mapping global forest biomass to better understand the terrestrial carbon cycle”, Remote Sensing of Environment, Vol. 115, No. 11, pp. 2850-2860, June 2011.

[7] P.A. Rosen, S. Hensley, S. Shaffer, L. Veilleux, M. Chakraborty, T. Misra, R. Bhan, V. Raju Sagi and R. Satish, "The NASA-ISRO SAR mission - An international space partnership for science and societal benefit", IEEE Radar Conference (RadarCon), pp. 1610-1613, 10-15 May 2015.

How to cite: Albinet, C., Barciauskas, A., Baynes, K., Chang, G. W., Freitag, B. M., Duncanson, L. I., Guala, G. F., Hook, H., Hunka, N., Laur, H., Lavalle, M., Lopes, C., Mandel, A., Moroni, D. F., Queune, T., Shah, S., and Thomas, N. M.: Joint ESA-NASA Multi-Mission Algorithm and Analysis Platform (MAAP), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20760, https://doi.org/10.5194/egusphere-egu24-20760, 2024.

EGU24-1857 | Orals | ESSI2.9

A Replicable Multi-Cloud Automation Architecture for Earth Observation 

Armagan Karatosun, Claudio Pisa, Tolga Kaprol, Vasileios Baousis, and Mohanad Albughdadi

The EO4EU project aims at making the access and use of Earth Observation (EO) data easier for environmental, government and business forecasts and operations.

To reach this goal, the EO4EU Platform will soon be made officially available, leveraging existing EO data sources such as DestinE, GEOSS, INSPIRE, Copernicus and Galileo, and offering advanced tools and services, based also on machine learning techniques, to help users find, access and handle the data they are interested in. The EO4EU Platform relies on a combination of a multi-cloud computing infrastructure coupled with pre-exascale high-performance computing facilities to manage demanding processing workloads.

The EO4EU multi-cloud infrastructure is composed by IaaS resources hosted on the WEkEO and CINECA Ada clouds, leveraged by a set of Kubernetes clusters dedicated to different workloads (e.g. cluster management tools, observability, or specific applications such as an inference server). To automate the deployment and management of these clusters, with advantages in terms of minimisation of dedicated effort and human errors, we have devised an Infrastructure-as-Code (IaC) architecture based on the Terraform, Rancher and Ansible technologies.

We believe that the proposed IaC architecture, based on open-source components and extensively documented and tested on the field, can be successfully replicated by other EO initiatives leveraging cloud infrastructures.

How to cite: Karatosun, A., Pisa, C., Kaprol, T., Baousis, V., and Albughdadi, M.: A Replicable Multi-Cloud Automation Architecture for Earth Observation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1857, https://doi.org/10.5194/egusphere-egu24-1857, 2024.

EGU24-6216 | Posters on site | ESSI2.9

Pangeo environment in Galaxy Earth System supported by Fair-Ease 

Thierry Carval, Marie Jossé, and Jérôme Detoc

The Earth System is a complex and dynamic system that encompasses the interactions between the atmosphere, oceans, land, and biosphere. Understanding and analyzing data from the Earth System Model (ESM) is essential, for example to predict and mitigate the impacts of climate change.

Today, collaborative efforts among scientists across diverse fields are increasingly urgent. The FAIR-EASE project aims to build an interdomain digital architecture for integrated and collaborative use of environmental data. Galaxy is a main component of this architecture which will be used by several domains of study chose by FAIR-EASE.

Galaxy, an open-source web platform, provides users with an easy and FAIR tool to access and handle multidisciplinary environmental data. By design, Galaxy manages data analyses by sharing and publishing all involved items like inputs, results, workflows, and visualisations, ensuring reproducibility by capturing the necessary information to repeat and understand data analyses.

From this point on, a Pangeo environment is a tool more than relevant to be used alongside earth-system related data and processing tools in order to create cross domain analyses. The good news is that a Pangeo environment is accessible on Galaxy. It can be exploited as a jupyterlab and allows the user to manage their NetCDF data in a Pangeo environment with the use of notebooks. Multiple tutorials are available on the Galaxy Training Network to learn how to use Pangeo.

The Galaxy Training Network significantly contributes to enhancing the accessibility and reusability of tools and workflows. The Galaxy Training platform hosts an extensive collection of tutorials. These tutorials serve as valuable resources for individuals seeking to learn how to navigate Galaxy, employ specific functionalities like Interactive Tools or how to execute workflows for specific analyses.

In synthetisis, Pangeo in Galaxy provide Pangeo users with an up-to-date data analysis platform ensuring reproducibility and mixing trainings and tools.

On the Earth System side, a first step was the creation of a Galaxy declination for Earth System studies (earth-system.usegalaxy.eu) with dedicated data, models, processing, visualisations and tutorials. It will make Earth System modeling more accessible to researchers in different fields.

In this Galaxy subdomain we choose to have the Pangeo tools. Our hope is to be able to implement cross domain workflows including climate and earth system sciences.

During this session our aim is to present how you can use the Pangeo environment from the Galaxy Earth System.

How to cite: Carval, T., Jossé, M., and Detoc, J.: Pangeo environment in Galaxy Earth System supported by Fair-Ease, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6216, https://doi.org/10.5194/egusphere-egu24-6216, 2024.

EGU24-7765 | Orals | ESSI2.9

Unleashing the power of Dask with a high-throughput Trust Region Reflectance solver for raster datacubes 

Bernhard Raml, Raphael Quast, Martin Schobben, Christoph Reimer, and Wolfgang Wagner

In remote sensing applications, the ability to efficiently fit models to vast amounts of observational data is vital for deriving high-quality data products, as well as accelerating research and development. Addressing this challenge, we developed a high-performance non-linear Trust Region Reflectance solver specialised for datacubes, by integrating Python's interoperability with C++ and Dask's distributed computing capabilities. Our solution achieves high throughput both locally and potentially on any Dask-compatible backend, such as EODC's Dask Gateway. The Dask framework takes care of chunking the datacube, and streaming each chunk efficiently to available workers where our specialised solver is applied. Introducing Dask for distributed computing enables our algorithm to run on different compatible backends. This approach not only broadens operational flexibility, but also allows us to focus on enhancing the algorithm's efficiency, free from concerns about concurrency. This enabled us to implement a highly efficient solver in C++, which is optimised to run on a single core, but still utilise all available resources effectively. For the heavy lifting, such as performing singular value decompositions and matrix operations we rely on Eigen, a powerful open-source C++ library specialized on linear algebra. To describe the spatial reference and other auxiliary data associated with our datacube, we employ the Xarray framework. Importantly, Xarray integrates seamlessly with Dask. Finally, to ensure robustness and extensibility of our framework, we applied state-of-the-art software engineering practices, including Continuous Integration and Test-Driven Development. In our work we demonstrate the significant performance gains achievable by effectively utilising available open-source frameworks, and adhering to best engineering practices. This is exemplified by our practical workflow demonstration to fit a soil moisture estimation model. 

How to cite: Raml, B., Quast, R., Schobben, M., Reimer, C., and Wagner, W.: Unleashing the power of Dask with a high-throughput Trust Region Reflectance solver for raster datacubes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7765, https://doi.org/10.5194/egusphere-egu24-7765, 2024.

The Earth System Grid Federation (ESGF) data nodes are usually the first address for accessing climate model datasets from WCRP-CMIP activities. It is currently hosting different datasets in several projects, e.g., CMIP6, CORDEX, Input4MIPs or Obs4MIPs. Datasets are usually hosted on different data nodes all over the world while data access is managed by any of the ESGF web portals through a web-based GUI or the ESGF Search RESTful API. The ESGF data nodes provide different access methods, e.g., https, OPeNDAP or Globus. 

Beyond ESGF, there has been the Pangeo / ESGF Cloud Data Working Group that coordinates efforts related to storing and cataloging CMIP data in the cloud, e.g., in the Google cloud and in the Amazon Web Services Simple Storage Service (S3) where a large part of the WCRP-CMIP6 ensemble of global climate simulations is now available in analysis-ready cloud-optimized (ARCO) zarr format. The availibility in the cloud has significantly lowered the barrier for users with limited resources and no access to an HPC environment to work with CMIP6 datasets and at the same time increases the chance for reproducibility and reusability of scientific results. 

Following the Pangeo strategy, we have adapted parts of the Pangeo Forge software stack for publishing our regional climate model datasets from the EURO-CORDEX initiative on AWS S3 cloud storage. The main tools involved are Xarray, Dask, Zarr, Intake and the ETL tools of pangeo-forge-recipes. Thanks to similar meta data conventions in comparison to the global CMIP6 datasets, the workflows require only minor adaptations. In this talk, we will show the strategy and workflow implemented and orchestrated in GitHub Actions workflows as well as a demonstration of how to access EURO-CORDEX datasets in the cloud.

How to cite: Buntemeyer, L.: Beyond ESGF – Bringing regional climate model datasets to the cloud on AWS S3 using the Pangeo Forge ETL framework, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8058, https://doi.org/10.5194/egusphere-egu24-8058, 2024.

EGU24-8343 | ECS | Posters on site | ESSI2.9 | Highlight

Implementation of a reproducible pipeline for producing seasonal Arctic sea ice forecasts 

Vanessa Stöckl, Björn Grüning, Anne Fouilloux, Jean Iaquinta, and Alejandro Coca-Castro

This work highlights the integration of IceNet (https://doi.org/10.1038/s41467-021-25257-4), a cutting-edge sea ice forecasting system leveraging numerous Python packages from the Pangeo ecosystem, into the Galaxy platform—an open-source tool designed for FAIR (Findable, Accessible, Interoperable, and Reusable) data analysis. Aligned with the Pangeo ecosystem's broader objectives, and carried out in the frame of the EuroScienceGateway project (https://eurosciencegateway.eu), this initiative embraces a collaborative approach to tackle significant geoscience data challenges. The primary aim is to democratise access to IceNet's capabilities by converting a Jupyter Notebook, published in the Environmental Data Science book (www.edsbook.org), into Galaxy Tools and crafting a reusable workflow executable through a Graphical User Interface or standardised APIs. IceNet is meant to predict Arctic sea ice concentration up to six months in advance, and it outperforms previous systems. This integration establishes a fully reproducible workflow, enabling scientists with diverse computational expertise to automate sea ice predictions. The IceNet workflow is hosted on the European Galaxy Server (https://climate.usegalaxy.eu), along with the related tools, ensuring accessibility for a wide community of researchers. With the urgency of accurate predictions amid global warming's impact on Arctic sea ice, this work addresses challenges faced by scientists, particularly those with limited programming experience. The transparent, accessible, and reproducible pipeline for Arctic sea ice forecasting aligns with Open and Science principles. The integrated IceNet into Galaxy enhances accessibility to advanced climate science tools, allowing for automated predictions that contribute to early and precise identification of potential damages from sea ice loss. This initiative mirrors the overarching goals of the Pangeo community, advancing transparent, accessible, and reproducible research. The Galaxy-based pipeline presented serves as a testament to collaborative efforts within the Pangeo community, breaking down barriers related to computational literacy and empowering a diverse range of scientists to contribute to climate science research. The integration of IceNet into Galaxy not only provides a valuable tool for seasonal sea ice predictions but also exemplifies the potential for broad interdisciplinary collaboration within the Pangeo ecosystem.

How to cite: Stöckl, V., Grüning, B., Fouilloux, A., Iaquinta, J., and Coca-Castro, A.: Implementation of a reproducible pipeline for producing seasonal Arctic sea ice forecasts, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8343, https://doi.org/10.5194/egusphere-egu24-8343, 2024.

EGU24-9156 | ECS | Orals | ESSI2.9

DataLabs: development of a cloud collaborative platform for open interdisciplinary geo-environmental sciences  

Michael Tso, Michael Hollaway, Faiza Samreen, Iain Walmsley, Matthew Fry, John Watkins, and Gordon Blair

In environmental science, scientists and practitioners are increasingly facing the need to create data-driven solutions to the environment's grand challenges, often needing to use data from disparate sources and advanced analytical methods, as well as drawing expertise from collaborative and cross-disciplinary teams [1]. Virtual labs allow scientists to collaboratively explore large or heterogeneous datasets, develop and share methods, and communicate their results to stakeholders and decision-makers. 

DataLabs [2] has been developed as a cloud-based collaborative platform to tackle these challenges and promote open, collaborative, interdisciplinary geo-environmental sciences. It allows users to share notebooks (e.g. JupyterLab, R Studio, and most recently VS Code), datasets and computational environments and promote transparency and end-to-end reasoning of model uncertainty. It supports FAIR access to data and digital assets by providing shared data stores and discovery functionality of datasets and assets hosted on the platform’s asset catalogue. Its tailorable design allows it to be adaptable to different challenges and applications. It is also an excellent platform for large collaborative teams to work on outputs together [3] as well as communicating results to stakeholders by allowing easy prototyping and publishing of web applications (e.g. Shiny, Panel, Voila). It is currently deployed on JASMIN [4] and is part of the UK NERC Environmental data service [5]. 

There are a growing number of use cases and requirements for DataLabs and it is going to play a central part in several planned digital research infrastructure (DRI) initiatives. Future development needs of the platform to further its vision include e.g. more intuitive onboarding experience, easier access to key datasets at source, better connectivity to other cloud platforms, and better use of workflow tools. DataLabs shares many of the features (e.g. heavy use of PANGEO core packages) and design principles of PANGEO. We would be interested in exploring commonalities and differences, sharing best practices, and growing the community of practice in this increasingly important area. 

[1]  Blair, G.S., Henrys, P., Leeson, A., Watkins, J., Eastoe, E., Jarvis, S., Young, P.J., 2019. Data Science of the Natural Environment: A Research Roadmap. Front. Environ. Sci. 7. https://doi.org/10.3389/fenvs.2019.00121  

[2] Hollaway, M.J., Dean, G., Blair, G.S., Brown, M., Henrys, P.A., Watkins, J., 2020. Tackling the Challenges of 21st-Century Open Science and Beyond: A Data Science Lab Approach. Patterns 1, 100103. https://doi.org/10.1016/j.patter.2020.100103 

[3] https://eds.ukri.org/news/impacts/datalabs-streamlines-workflow-assessing-state-nature-uk  

[4] https://jasmin.ac.uk/  

[5] https://eds.ukri.org/news/impacts/datalabs-digital-collaborative-platform-tackling-environmental-science-challenges  

How to cite: Tso, M., Hollaway, M., Samreen, F., Walmsley, I., Fry, M., Watkins, J., and Blair, G.: DataLabs: development of a cloud collaborative platform for open interdisciplinary geo-environmental sciences , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9156, https://doi.org/10.5194/egusphere-egu24-9156, 2024.

EGU24-9781 | Posters on site | ESSI2.9

Optimizing NetCDF performance for cloud computing : exploring a new chunking strategy 

Flavien Gouillon, Cédric Pénard, Xavier Delaunay, and Florian Wery

Owing to the increasing number of satellites and advancements in sensor resolutions, the volume of scientific data is experiencing rapid growth. NetCDF (Network Common Data Form) stands as the community standard for storing such data, necessitating the development of efficient solutions for file storage and manipulation in this format.

Object storage, emerging with cloud infrastructures, offers potential solutions for data storage and parallel access challenges. However, NetCDF may not fully harness this technology without appropriate adjustments and fine-tuning. To optimize computing and storage resource utilization, evaluating NetCDF performance on cloud infrastructures is essential. Additionally, exploring how cloud-developed software solutions contribute to enhanced overall performance for scientific data is crucial.

Offering multiple file versions with data split into chunks tailored for each use case incurs significant storage costs. Thus, we investigate methods to read portions of compressed chunks, creating virtual sub-chunks that can be read independently. A novel approach involves indexing data within NetCDF chunks compressed with deflate, enabling extraction of smaller data portions without reading the entire chunk.

This feature is very valuable in use cases such as pixel drilling or extracting small amounts of data from large files with sizable chunks. It also saves reading time, particularly in scenarios of poor network connection, such as those encountered onboard research vessels.

We conduct performance assessments of various libraries in various use cases to provide recommendations for the most suitable and efficient library for reading NetCDF data in different situations.

Our tests involved accessing remote NetCDF datasets (two files from the SWOT mission) available on the network via a lighttpd server and an s3 server. Additionally, simulations of degraded Internet connections, featuring high latency, packet loss, and limited bandwidth, are also performed.

We evaluate the performance of four Python libraries (netcdf4 lib, Xarray, h5py, and our chunk indexing library) for reading dataset portions through fsspec or fs_s3. A comparison of reading performance using netCDF, zarr, and nczarr data formats is also conducted on an s3 server.

Preliminary findings indicate that the h5py library is the most efficient, while Xarray exhibits poor performance in reading NetCDF files. Furthermore, the NetCDF format demonstrates reasonably good performance on an s3 server, albeit lower than zarr or nczarr formats. However, the considerable efforts required to convert petabytes of archived NetCDF files and adapt numerous software libraries for a performance improvement within the same order of magnitude can raise questions about the practicality of such endeavors and benefits is thus extremely related to the use cases.

How to cite: Gouillon, F., Pénard, C., Delaunay, X., and Wery, F.: Optimizing NetCDF performance for cloud computing : exploring a new chunking strategy, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9781, https://doi.org/10.5194/egusphere-egu24-9781, 2024.

EGU24-9795 | ECS | Orals | ESSI2.9

Unifying HPC and Cloud Systems; A Containerized Approach for the Integrated Forecast System (IFS) 

Cathal O'Brien, Armagan Karatosun, Adrian Hill, Paul Cresswell, Michael Sleigh, and Ioan Hadade

The IFS (Integrated Forecast System) is a global numerical weather prediction system maintained by the European Centre for Medium-Range Weather Forecasts (ECMWF). Traditionally, ECMWF’s high-performance computing facility (HPCF) is responsible for operationally supporting the IFS cycles. However, with the emergence of new cloud technologies, initiatives such as Destination Earth (DestinE), and growth of OpenIFS users within Europe and around the globe, the need to run IFS outside of ECMWF's computing facilities becomes more evident. Concerning such use cases, IFSTestsuite allows for the complete IFS system and its dependencies (e.g. ecCodes) to be built and tested outside of ECMWF's HPCF and designed to be self-contained, eliminating the need for external tools like MARS or ecCodes. Despite the need for users to perform multiple steps and the dependency of the software availability and versions on the host operating system, this indicates that there might be a potential for more generic and broader approach. 

Containerization might provide the much-needed portability and disposable environments to trigger new cycles with the desired compiler versions, or even with different compilers. In addition, pre-built container images can be executed on any platform, provided there is a compatible container runtime installed on the target system that adheres to Open Container Initiative (OCI) standards like Singularity or Docker. Another benefit of using container images is container image layers which can significantly reduce the image build time. Lastly, despite their differences, both Singularity and Docker adhere to the OCI standards, and converting one container image to another is straightforward. However, despite the clear advantages, there are several crucial design choices to keep in mind. Notably, the available hardware and software stacks varies greatly across different HPC systems. When performance is important, this heterogeneous landscape limits the portability of containers. The libraries and drivers inside the container must be specially selected with regard to the hardware and software stack of a specific host system to maximize performance on that system. If this is done correctly, the performance of containerized HPC applications can match native applications. We demonstrate this process with the use of a hybrid containerization strategy where compatible MPI stacks and drivers are built inside the containers. The binding of host libraries into containers is also used on systems where proprietary software cannot be rebuilt inside the container.  

In this study we present a containerized solution which balances portability and efficient performance, with examples of containerizing the IFS on a variety of systems including cloud systems with generic x86-64 architecture, such as European Weather Cloud (EWC) and Microsoft Azure, on EuroHPC systems such as Leonardo and LUMI and provided container image recipes for OpenIFS. 

How to cite: O'Brien, C., Karatosun, A., Hill, A., Cresswell, P., Sleigh, M., and Hadade, I.: Unifying HPC and Cloud Systems; A Containerized Approach for the Integrated Forecast System (IFS), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9795, https://doi.org/10.5194/egusphere-egu24-9795, 2024.

EGU24-10741 | Posters on site | ESSI2.9

Harnessing the Pangeo ecosystem for delivering the cloud-based Global Fish Tracking System 

Daniel Wiesmann, Tina Odaka, Anne Fouilloux, Emmanuelle Autret, Mathieu Woillez, and Benjamin Ragan-Kelley

We present our approach of leveraging the Pangeo software stack for developing the Global Fish Tracking System (GFTS). The GFTS project tackles the challenge of accurately modelling fish movement in the ocean based on biologging data with a primary focus on Sea Bass. Modelling fish movements is essential to better understand migration strategies and site fidelity, which are critical aspects for fish stock management policy and marine life conservation efforts.

Estimating fish movements is a highly compute intensive process. It involves matching pressure and temperature data from in-situ biologging sensors with high resolution ocean temperature simulations over long time periods. The Pangeo software stack provides an ideal environment for this kind of modelling. While the primary target platform of the GFTS project is the new Destination Earth Service Platform (DESP), relying on the Pangeo ecosystem ensures that the GFTS project is a robust and portable solution that can be re-deployed on different infrastructure. 

One of the distinctive features of the GFTS project is its advanced data management approach, synergizing with the capabilities of Pangeo. Diverse datasets, including climate change adaptation digital twin data, sea temperature observations, bathymetry, and biologging in-situ data from tagged fish, are seamlessly integrated within the Pangeo environment. A dedicated software called pangeo-fish has been developed to streamline this complex modelling process. The technical framework of the GFTS project includes Pangeo core packages such as Xarray and Dask, which facilitate scalable computations.

Pangeo's added value in data management becomes apparent in its capability to optimise data access and enhance performance. The concept of "data visitation" is central to this approach. By strategically deploying Dask clusters close to the data sources, the GFTS project aims to significantly improve performance of fish track modelling when compared to traditional approaches. This optimised data access ensures that end-users can efficiently interact with large datasets, leading to more streamlined and efficient analyses.

The cloud-based delivery of the GFTS project aligns with the overarching goal of Pangeo. In addition, the GFTS includes the development of a custom interactive Decision Support Tool (DST). The DST empowers non-technical users with an intuitive interface for better understanding the results of the GFTS project, leading to more informed decision-making. The integration with Pangeo and providing intuitive access to the GFTS data is not merely a technicality; it is a commitment to FAIR (Findable, Accessible, Interoperable and Reusable), TRUST (Transparency, Responsibility, User focus, Sustainability and Technology) and open science principles. 

In short, the GFTS project, within the Pangeo ecosystem, exemplifies how advanced data management, coupled with the optimization of data access through "data visitation," can significantly enhance the performance and usability of geoscience tools. This collaborative and innovative approach not only benefits the immediate goals of the GFTS project but contributes to the evolving landscape of community-driven geoscience initiatives.

How to cite: Wiesmann, D., Odaka, T., Fouilloux, A., Autret, E., Woillez, M., and Ragan-Kelley, B.: Harnessing the Pangeo ecosystem for delivering the cloud-based Global Fish Tracking System, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10741, https://doi.org/10.5194/egusphere-egu24-10741, 2024.

EGU24-12410 | Orals | ESSI2.9

Towards Enhancing WaaS and Data Provenance over Reana 

Iraklis Klampanos, Antonis Ganios, and Antonis Troumpoukis

Interoperability and reproducibility are critical aspects of scientific computation. The data analysis platform Reana [1], developed by CERN, enhances the interoperability and reproducibility of scientific analyses by allowing researchers to describe, execute, and share their analyses. This is achieved via the execution of standardised scientific workflows, such as CWL, within reusable containers. Moreover, it allows execution to span different types of resources, such as Cloud and HPC. 

In this session we will present ongoing work to enhance Reana’s Workflows-as-a-Service (WaaS) functionality and also support Workflow registration and discoverability. Building upon the design goals and principles of the DARE platform [2], this work aims to enhance Reana by enabling users to register and discover available workflows within the system. In addition, we will present the integration of Data Provenance based on the W3C PROV-O standard [3] allowing the tracking and recording of data lineage in a systematic and dependable way across resource types. 

In summary, key aspects of this ongoing work include:

  • Workflows-as-a-Service (WaaS): Extending Reana's service-oriented mode of operation, allowing users to register, discover, access, execute, and manage workflows by name or ID, via APIs, therefore enhancing the platform's accessibility and usability.
  • Data Provenance based on W3C PROV-O: Implementing support for recording and visualising data lineage information in compliance with the W3C PROV-O standard. This ensures transparency and traceability of data processing steps, aiding in reproducibility and understanding of scientific analyses.

This work aims to broaden Reana's functionality, aligning with best practices for reproducible and transparent scientific research. We aim to make use of the enhanced Reana-based system on the European AI-on-demand platform [4], currently under development, to address the requirements of AI innovators and researchers when studying and executing large-scale AI-infused workflows.

References: 

[1] Simko et al., (2019). Reana: A system for reusable research data analyses. EPJ Web Conf., 214:06034, https://doi.org/10.1051/epjconf/201921406034

[2] Klampanos et al., (2020). DARE Platform: a Developer-Friendly and Self-Optimising Workflows-as-a-Service Framework for e-Science on the Cloud. Journal of Open Source Software, 5(54), 2664, https://doi.org/10.21105/joss.02664

[3] PROV-O: The PROV Ontology: https://www.w3.org/TR/prov-o/ (viewed 9 Jan 2024)

[4] The European AI-on-Demand platform: https://aiod.eu (viewed 9 Jan 2024)

This work has been has received funding from the European Union’s Horizon Europe research and innovation programme under Grant Agreement No 101070000.

How to cite: Klampanos, I., Ganios, A., and Troumpoukis, A.: Towards Enhancing WaaS and Data Provenance over Reana, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12410, https://doi.org/10.5194/egusphere-egu24-12410, 2024.

EGU24-12669 | ECS | Orals | ESSI2.9

DeployAI to Deliver Interoperability of Cloud and HPC Resources for Earth Observation in the Context of the European AI-on-Demand Platform 

Antonis Troumpoukis, Iraklis Klampanos, and Vangelis Karkaletsis

The European AI-on-Demand Platform (AIoD, http://aiod.eu) is a vital resource for leveraging and boosting the European AI research landscape towards economic growth and societal advancement across Europe. Following and emphasising European values, such as openness, transparency, and trustworthiness for developing and using AI technologies, the AIoD platform aims to become the main one-stop shop for exchanging and building AI resources and applications within the European AI innovation ecosystem, whilst also adhering to European values. The primary goal of the DIGITAL-EUROPE CSA initiative DeployAI (DIGITAL-2022-CLOUD-AI-B-03, 01/2024-12/2027) is to build, deploy, and launch a fully operational AIoD platform, promoting trustworthy, ethical, and transparent European AI solutions for the industry, with a focus on SMEs and the public sector.

Building on Open-source and trusted software, DeployAI will provide a number of technological assets such as a comprehensive and Trustworthy AI resource catalogue and marketplace offering responsible AI resources and tools, workflow composition and execution systems for prototyping and user-friendly creation of novel services, responsible foundational models and services to foster dependable innovation, etc. In addition, and building upon the results of the ICT-49 AI4Copernicus project [1], which provided a bridge between the AIoD platform and the Copernicus ecosystem and the DIAS platforms, DeployAI will integrate impactful Earth Observation AI services into the AIoD platform. These will include (but not limited to) satellite imagery preprocessing, land usage classification, crop type identification, super-resolution, and weather forecasting.

Furthermore, DeployAI will allow the rapid prototyping of AI applications and their deployment to a variety of Cloud/Edge/HPC infrastructures. The project will focus on establishing a cohesive interaction framework that integrates with European Data Spaces and Gaia-X initiatives, HPC systems with an emphasis on the EuroHPC context, and the European Open Science Cloud. Interfaces to European initiatives and industrial AI-capable cloud platforms will be further implemented to enable interoperability. This capability enables the execution of Earth Observation applications not only within the context of a DIAS/DAS but also within several other compute systems. This level of interoperability enhances the adaptability and accessibility of AI applications, fostering a collaborative environment where geoscientific workflows can be seamlessly executed across diverse computational infrastructures and made available to a wide audience of innovators.

[1] A. Troumpoukis et al., "Bridging the European Earth-Observation and AI Communities for Data-Intensive Innovation", 2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService), Athens, Greece, 2023, pp. 9-16, doi:10.1109/BigDataService58306.2023.00008.

This work has been has received funding from the European Union’s Digital Europe Programme (DIGITAL) under grant agreement No 101146490.

How to cite: Troumpoukis, A., Klampanos, I., and Karkaletsis, V.: DeployAI to Deliver Interoperability of Cloud and HPC Resources for Earth Observation in the Context of the European AI-on-Demand Platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12669, https://doi.org/10.5194/egusphere-egu24-12669, 2024.

EGU24-15366 | ECS | Posters on site | ESSI2.9

Enabling seamless integration of Copernicus and in-situ data 

Iason Sotiropoulos, Athos Papanikolaou, Odysseas Sekkas, Anastasios Polydoros, Vassileios Tsetsos, Claudio Pisa, and Stamatia Rizou

BUILDSPACE aims to combine terrestrial data from buildings collected by IoT devices with aerial imaging from drones equipped with thermal cameras and location annotated data from satellite services (i.e., EGNSS and Copernicus) to deliver innovative services at building scale, enabling the generation of high fidelity multi-modal digital twins and at city scale providing decision support services for energy demand prediction, urban heat and urban flood analysis. A pivotal element and the foundational support of the BUILDSPACE ecosystem is the Core Platform and it plays a crucial role in facilitating seamless data exchange, secure and scalable data storage, and streamlined access to data from three Copernicus services, namely the Land, Atmosphere, and Climate Change.The platform's underlying technology is robust, incorporating two key components: OIDC for user authentication and group authorization over the data, and a REST API to handle various file operations. OIDC stands for OpenID Connect, a standard protocol that enables secure user authentication and allows for effective management of user groups and their access permissions. On the other hand, the platform employs a REST API for seamless handling of file-related tasks, including uploading, downloading, and sharing. This combination ensures efficient and secure data exchange within the system. Additionally, the use of an S3 compatible file system ensures secure and scalable file storage, while a separate metadata storage system enhances data organization and accessibility. Currently deployed on a Kubernetes cluster, this platform offers numerous advantages, including enhanced scalability, efficient resource management, and simplified deployment processes. The implementation of the Core Platform has led to a current focus on integrating APIs from Copernicus services into the Core Platform's API. This ongoing effort aims to enhance the platform's capabilities by seamlessly incorporating external data, enriching the overall functionality and utility of the project.

How to cite: Sotiropoulos, I., Papanikolaou, A., Sekkas, O., Polydoros, A., Tsetsos, V., Pisa, C., and Rizou, S.: Enabling seamless integration of Copernicus and in-situ data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15366, https://doi.org/10.5194/egusphere-egu24-15366, 2024.

EGU24-15416 | ECS | Orals | ESSI2.9

XDGGS: Xarray Extension for Discrete Global Grid Systems (DGGS) 

Alexander Kmoch, Benoît Bovy, Justus Magin, Ryan Abernathey, Peter Strobl, Alejandro Coca-Castro, Anne Fouilloux, Daniel Loos, and Tina Odaka

Traditional geospatial representations of the globe on a 2-dimensional plane often introduce distortions in area, distance, and angles. Discrete Global Grid Systems (DGGS) mitigate these distortions and introduce a hierarchical structure of global grids. Defined by ISO standards, DGGSs serve as spatial reference systems facilitating data cube construction, enabling integration and aggregation of multi-resolution data sources. Various tessellation schemes such as hexagons and triangles cater to different needs - equal area, optimal neighborhoods, congruent parent-child relationships, ease of use, or vector field representation in modeling flows.

The fusion of Discrete Global Grid Systems (DGGS) and Datacubes represents a promising synergy for integrated handling of planetary-scale data.

The recent Pangeo community initiative at the ESA BiDS'23 conference has led to significant advancements in supporting Discrete Global Grid Systems (DGGS) within the widely used Xarray package. This collaboration resulted in the development of the Xarray extension XDGGS (https://github.com/xarray-contrib/xdggs). The aim of xdggs is to provide a unified, high-level, and user-friendly API that simplifies working with various DGGS types and their respective backend libraries, seamlessly integrating with Xarray and the Pangeo scientific computing ecosystem. Executable notebooks demonstrating the use of the xdggs package are also developed to showcase its capabilities.

This development represents a significant step forward, though continuous efforts are necessary to broaden the accessibility of DGGS for scientific and operational applications, especially in handling gridded data such as global climate and ocean modeling, satellite imagery, raster data, and maps.

Keywords: Discrete Global Grid Systems, Xarray Extension, Geospatial Data Integration, Earth Observation, Data Cube, Scientific Collaboration

How to cite: Kmoch, A., Bovy, B., Magin, J., Abernathey, R., Strobl, P., Coca-Castro, A., Fouilloux, A., Loos, D., and Odaka, T.: XDGGS: Xarray Extension for Discrete Global Grid Systems (DGGS), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15416, https://doi.org/10.5194/egusphere-egu24-15416, 2024.

EGU24-15872 | Posters on site | ESSI2.9

Deploying Pangeo on HPC: our experience with the Remote Sensing Deployment Analysis environmenT on SURF infrastructure 

Francesco Nattino, Meiert W. Grootes, Pranav Chandramouli, Ou Ku, Fakhereh Alidoost, and Yifat Dzigan

The Pangeo software stack includes powerful tools that have the potential to revolutionize the way in which research on big (geo)data is conducted. A few of the aspects that make them very attractive to researchers are the ease of use of the Jupyter web-based interface, the level of integration of the tools with the Dask distributed computing library, and the possibility to seamlessly move from local deployments to large-scale infrastructures. 

The Pangeo community and project Pythia are playing a key role in providing training resources and examples that showcase what is possible with these tools. These are essential to guide interested researchers with clear end goals but also to provide inspiration for new applications. 

However, configuring and setting up a Pangeo-like deployment is not always straightforward. Scientists whose primary focus is domain-specific often do not have the time to spend solving issues that are mostly ICT in nature. In this contribution, we share our experience in providing support to researchers in running use cases backed by deployments based on Jupyter and Dask at the SURF supercomputing center in the Netherlands, in what we call the Remote Sensing Deployment Analysis environmenT (RS-DAT) project. 

Despite the popularity of cloud-based deployments, which are justified by the enormous data availability at various public cloud providers, we discuss the role that HPC infrastructure still plays for researchers, due to the ease of access via merit-based allocation grants and the requirements of integration with pre-existing workflows. We present the solution that we have identified to seamlessly access datasets from the SURF dCache massive storage system, we stress how installation and deployment scripts can facilitate adoption and re-use, and we finally highlight how technical research-support staff such as Research Software Engineers can be key in bridging researchers and HPC centers. 

How to cite: Nattino, F., Grootes, M. W., Chandramouli, P., Ku, O., Alidoost, F., and Dzigan, Y.: Deploying Pangeo on HPC: our experience with the Remote Sensing Deployment Analysis environmenT on SURF infrastructure, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15872, https://doi.org/10.5194/egusphere-egu24-15872, 2024.

EGU24-17111 | Posters on site | ESSI2.9

Cloudifying Earth System Model Output 

Fabian Wachsmann

We introduce eerie.cloud (eerie.cloud.dkrz.de), a data server for efficient access to prominent climate data sets stored on disk at the German Climate Computing Center (DKRZ). We show how we “cloudify” data from two projects, EERIE and ERA5, and how one can benefit from it. 

The European Eddy-rich Earth System Model (EERIE) project aims to develop state-of-the-art high-resolution Earth System Models (ESM) that are able to resolve ocean mesoscale processes. These models are then used to perform simulations over centennial scales and make their output available for the global community. At present, the total volume of the EERIE data set exceeds 0.5PB  and is rapidly growing, posing challenges for data management.
ERA5 is the fifth generation ECMWF global atmospheric reanalysis. It is widely used as forcing data for climate model simulations, for model evaluation or for the analysis of climate trends. DKRZ maintains a 1.6 PB subset of ERA5 data at its native resolution.

We use Xpublish to set up the data server. Xpublish is a python package and a plugin for Pangeo's central analysis package Xarray. Its main feature is to provide ESM output by mapping any input data to virtual zarr data sets. Users can retrieve these data sets as if they were cloud-native and cloud-optimized.

eerie.cloud features

  • Parallel access to data subsets on chunk-level
  • Interfaces to make the data more FAIR
    • User friendly content overviews with displays of xarray-like dataset representations
    • Simple browsing and loading data with an intake catalog
  • On-the-fly server-side computation 
    • Register simple xarray routines for generating customized variables
    • Compression for speeding up downloads
  • Generation of interactive geographical plots, including animations

Eerie.cloud is a solution to make EERIE data more usable by a wider community.

How to cite: Wachsmann, F.: Cloudifying Earth System Model Output, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17111, https://doi.org/10.5194/egusphere-egu24-17111, 2024.

EGU24-17150 | ECS | Posters on site | ESSI2.9

Data access patterns of km-scale resolution models 

Janos Zimmermann, Florian Ziemen, and Tobias Kölling

Climate models produce vast amounts of output data. In the nextGEMS project, we have run the ICON model at 5 km resolution for 5 years, producing about 750 TB of output data from one simulation. To ease analysis, the data is stored at multiple temporal and spatial resolutions. The dataset is now analyzed by more than a hundred scientists on the DKRZ levante system. As disk space is limited, it is crucial to obtain information, which parts of this dataset are accessed frequently and need to be kept on disk, and which parts can be moved to the tape archive and only be fetched on request.

By storing the output as zarr files with many small files for the individual data chunks, and logging file access times, we obtained a detailed view of more than half a year of access to the nextGEMS dataset, even going to regional level for a given variable and time step. The evaluation of those access patterns offers the possibility to optimize various aspects such as caching, chunking, and archiving. Furthermore, it provides valuable information for designing future output configurations.

In this poster, we present the observed access patterns and discuss their implications for our chunking and archiving strategy. Leveraging an interactive visualization tool, we explore and compare access patterns, distinguishing frequently accessed subsets, sparsely accessed variables, and preferred resolutions. We furthermore provide information on how we analyzed the data access to enable other users to follow our approach.

How to cite: Zimmermann, J., Ziemen, F., and Kölling, T.: Data access patterns of km-scale resolution models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17150, https://doi.org/10.5194/egusphere-egu24-17150, 2024.

EGU24-18256 | Orals | ESSI2.9

Data access for km-scale resolution models 

Florian Ziemen, Tobias Kölling, and Lukas Kluft

With the transition to global, km-scale simulations, model outputs have grown in size, and efficient ways of accessing data have become more important than ever. This implies that the data storage has to be optimized for efficient read access to small sub-sets of the data, and multiple resolutions of the same data need to be provided for efficient analysis on coarse as well as fine-grained scales.

In this high-level overview presentation, we present an approach based on datasets. Each dataset represents a coherent subset of a model output (e.g. all model variables stored at daily resolution). Aiming for a minimum number of datasets makes us enforce consistency in the model output and thus eases analysis. Each dataset is served to the user as one zarr store, independent of the actual file layout on disks or other storage media. Multiple datasets are grouped in catalogs for findability.

By serving the data via https, we can implement a middle layer between the user and the storage systems, allowing to combine different storage backends behind a unifying frontend. At the same time, this approach allows us to largely build the system on existing technologies such as web servers and caches, and efficiently serve data to users outside the compute center where the data is stored.
The approach we present is currently under development in the BMBF project WarmWorld with contributions by the H2020 project nextGEMS, and we expect it to be useful for many other projects as well.

How to cite: Ziemen, F., Kölling, T., and Kluft, L.: Data access for km-scale resolution models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18256, https://doi.org/10.5194/egusphere-egu24-18256, 2024.

EGU24-18585 | ECS | Posters on site | ESSI2.9

STAC catalogs for time-varying in-situ data 

Justus Magin

The ability to search a collection of datasets is an important factor for the usefulness of the data. By organizing the metadata into catalogs, we can enable dataset discovery, look up file locations and avoid access to the data files before the actual computation. Spatio-Temporal Asset Catalogs (STAC) is a increasingly popular language-agnostic specification and vibrant ecosystem of tools for geospatial data catalogs, and is tailored for raster data like satellite imagery. It allows for a search using a variety of patterns, including the spatial and temporal extent.

In-situ data is heterogenous and would benefit from being cataloged, as well as the ecosystem of tools. However, due to the strict separation between the spatial and temporal dimensions in STAC the time-varying nature of in-situ data is not optimally captured. While for approximately stationary sensors like tide gauges, moorings, weather stations, and high-frequency radars this is not an issue (see https://doi.org/10.5194/egusphere-egu23-8096), it becomes troublesome for moving sensors, especially if the sensor moves at a high speed, covers big distances, or if the dataset contains a long time series.

To resolve this, we extend the STAC specification by replacing the geojson data with the JSON-encoded ODC moving feature standard.

How to cite: Magin, J.: STAC catalogs for time-varying in-situ data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18585, https://doi.org/10.5194/egusphere-egu24-18585, 2024.

EGU24-20779 | Orals | ESSI2.9

Project Pythia: Building an Inclusive Geoscience Community with Cookbooks 

John Clyne, Brian Rose, Orhan Eroglu, James Munroe, Ryan May, Drew Camron, Julia Kent, Amelia Snyder, Kevin Tyle, Maxwell Grover, and Robert Ford

Project Pythia is the educational arm of the Pangeo community, and provides a growing collection of community driven and developed training resources that help geoscientists navigate the Pangeo ecosystem, and the myriad complex technologies essential for today’s Big Data science challenges. Project Pythia began in 2020 with the support of a U.S. NSF EarthCube award. Much of the initial effort focused on Pythia Foundations: a collection of Jupyter Notebooks that covered essential topics such as Python language basics; managing projects with GitHub; authoring and using “binderized” Jupyter Notebooks; and many of Pangeo’s core packages such as Xarray, Pandas, and Matplotlib. Building upon Foundations, the Pythia community turned its attention toward creating Pythia Cookbooks: exemplar collections of recipes for transforming raw ingredients (publicly available, cloud-hosted data) into scientifically useful results. Built from Jupyter Notebooks, Cookbooks are explicitly tied to reproducible computational environments and supported by a rich infrastructure enabling collaborative authoring and automated health-checking – essential tools in the struggle against the widespread notebook obsolescence problem.

 

Open-access, cloud-based Cookbooks are a democratizing force for growing the capacity of current and future geoscientists to practice open science within the rapidly evolving open science ecosystem. In this talk we outline our vision of a sustainable, inclusive open geoscience community enabled by Cookbooks. With further support from the NSF, the Pythia community will accelerate the development and broad buy-in of these resources, demonstrating highly scalable versions of common analysis workflows on high-value datasets across the geosciences. Infrastructure will be deployed for performant data-proximate Cookbook authoring, testing, and use, on both commercial and public cloud platforms. Content and community will expand through annual workshops, outreach, and classroom use, with recruitment targeting under-served communities. Priorities will be guided by an independent steering board; sustainability will be achieved by nurturing a vibrant, inclusive community backed by automation that lowers barriers to participation.

How to cite: Clyne, J., Rose, B., Eroglu, O., Munroe, J., May, R., Camron, D., Kent, J., Snyder, A., Tyle, K., Grover, M., and Ford, R.: Project Pythia: Building an Inclusive Geoscience Community with Cookbooks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20779, https://doi.org/10.5194/egusphere-egu24-20779, 2024.

EGU24-20909 | ECS | Orals | ESSI2.9

UXarray: Extensions to Xarray to support unstructured grids 

Orhan Eroglu, Hongyu Chen, Philip Chmielowiec, John Clyne, Corrine DeCiampa, Cecile Hannay, Robert Jacob, Rajeev Jain, Richard Loft, Brian Medeiros, Lantao Sun, Paul Ullrich, and Colin Zarzycki

The arrival of kilometer-scale climate and global weather models presents substantial challenges for the analysis and visualization of the resulting data, not only because of their tremendous size but also because of the employment of unstructured grids upon which the governing equations of state are solved. Few Open Source analysis and visualization software tools exist that are capable of operating directly on unstructured grid data. Those that do exist are not comprehensive in the capabilities they offer, do not scale adequately, or both. Recognizing this gap in much-needed capability, Project Raijin - funded by an NSF EarthCube award - and the DOE SEATS project, launched a collaborative effort to develop an open source Python package called UXarray. 

UXarray extends the widely used Xarray package, providing support for operating directly (without regridding) on unstructured grid model outputs found in the Earth System Sciences, such as CAM-SE, MPAS, SCRIP, UGRID, and in the future, ICON. Much like Xarray, UXarray provides fundamental analysis and visualization operators, upon which more specialized, domain-specific capabilities can be layered. This talk will present an overview of the current capabilities of UXarray, provide a roadmap for near term future development, and will describe how the Pangeo community can contribute to this on-going effort.

How to cite: Eroglu, O., Chen, H., Chmielowiec, P., Clyne, J., DeCiampa, C., Hannay, C., Jacob, R., Jain, R., Loft, R., Medeiros, B., Sun, L., Ullrich, P., and Zarzycki, C.: UXarray: Extensions to Xarray to support unstructured grids, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20909, https://doi.org/10.5194/egusphere-egu24-20909, 2024.

EGU24-5123 | ECS | Orals | ESSI2.10

RE:Growth—A Google Earth Engine toolkit analyzing secondary forest carbon dynamics in the Brazilian Amazon 

Viola Heinrich, Stephen Sitch, Thais Rosan, Celso Silva Junior, and Luiz Aragão

Secondary forests, forest naturally regrowing on areas of previously deforested, now abandoned lands are crucial to help maintain and increase the carbon sink on land, helping to tackle the climate and ecological emergencies. There is a growing research field on improving our understanding of the carbon dynamics of secondary forests, often using novel remote sensing techniques to map the temporal and spatial patterns of change. However, the full datasets from the research are often not fully accessible to all users, either because the whole dataset is not published, or because it is presented in a specialist format. Given both the scientific and policy interest in forest-carbon dynamics, it is critical to ensure datasets are accessible to a wide range of audiences. 

Here we present “RE:Growth” – a user-friendly toolkit designed in Google Earth Engine, enabling users to view and download the aboveground carbon dynamics spatially and temporally in secondary forests in one of the largest tropical forest regions; the Brazilian Amazon. Designed to be easily updated as more temporal and spatial data are made available, ‘RE:Growth’ provides spatial and aggregated information on carbon flux dynamics in secondary forests through time based on Earth Observation. Uniquely, the user can draw their own region or select a jurisdictional boundary in the Brazilian Amazon to focus the analysis to their region of interest. Such information can be used within the measurement, reporting and verification framework, which is critical for results-based payments at all scales. For example, the toolkit can be used to provide spatial and quantitative data to inform the spatial prioritization of secondary forest conservation and expansion in line with Brazil’s Nationally Determined Contributions (NDC) to the Paris Agreement.

How to cite: Heinrich, V., Sitch, S., Rosan, T., Silva Junior, C., and Aragão, L.: RE:Growth—A Google Earth Engine toolkit analyzing secondary forest carbon dynamics in the Brazilian Amazon, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5123, https://doi.org/10.5194/egusphere-egu24-5123, 2024.

EGU24-6454 | ECS | Orals | ESSI2.10

Cloud-based agricultural crop water use monitoring across Saudi Arabia 

Oliver Miguel Lopez Valencia, Ting Li, Bruno Jose Luis Aragon Solorio, and Matthew Francis McCabe

Monitoring agricultural water use is essential to ensure water security, especially in regions facing water scarcity. Satellite-acquired multi-spectral images of the Earth’s surface provide crucial data to enable frequent estimations of crop water use. In large data-scarce regions, these estimations represent a key source of information for water management. The emergence of cloud-based platforms, such as Google Earth Engine (GEE), has made it feasible, accessible, and cost-effective to automate crop water use monitoring pipelines. Here, we demonstrate the potential benefits of a cloud-based crop water use estimation and monitoring framework by estimating a decade's worth of agricultural water use over Saudi Arabia. In Saudi Arabia, large-scale agricultural activities account for the majority  (>80%) of water use, water which is sourced primarily from non-renewable groundwater resources from the Arabian Shelf. Saudi Arabia’s large land area (> 2 million  km2) and the long study period (+10 years) forms the basis of a case-study for our cloud-based model. Previous mapping efforts provided annual maps of individual field boundary delineations at, identifying more than 30,000 fields covering a total of more than 10,000 km2 of croplands that are distributed across several large-scale agricultural clusters within the Kingdom. As a preprocessing step, we developed an approach to generate large-scale delineations of irrigated agricultural regions over arid areas. This approach helped reduce processing efforts for field delineations, and at the same time reducing the water use estimation computations. Our GEE cloud-based model implements a two-source energy balance model (TSEB) and automatically incorporates all available Landsat collection 2 surface reflectance and surface temperature products from Landsat 7, 8, and 9, along with climate reanalysis data from the ECMWF ERA5-Land hourly product. The model can be readily applied elsewhere by defining just the date range and study geometry, while allowing the flexibility for more advanced users to control parameters within the TSEB model. Total crop water use (evapotranspiration term only; not accounting for irrigation efficiency) was estimated at between 7 and 12 BCM per year of study, with the highest use in 2016 and the lowest in 2020. Both Riyadh and al Jawf administrative regions collectively shared more than half of the total study cropland area, and a similar contribution of water use. This study represents the convergence of a number of efforts towards developing operational crop water use monitoring, while motivating further applications in other regions and providing a rich dataset for further food and water security related studies.

How to cite: Lopez Valencia, O. M., Li, T., Aragon Solorio, B. J. L., and McCabe, M. F.: Cloud-based agricultural crop water use monitoring across Saudi Arabia, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6454, https://doi.org/10.5194/egusphere-egu24-6454, 2024.

EGU24-7620 | ECS | Posters virtual | ESSI2.10

Machine learning implementation for mapping irrigated areas at fine temporal and spatial resolutions in the Limpopo River Basin  

Zolo Kiala, Karthikeyan Matheswaran, Andarcia, Mariangel Garcia, and Chris Dickens

Most of the global available freshwater for food production is utilized for irrigation. Irrigation expansion is crucial for agriculture production as it can increase crop yields and be a dependable adaptation measure against climate change. Accurate information on the spatial extent of irrigated areas and their dynamic shifts is therefore essential for efficiently managing already pressured water resources. The multiplicity of remotely sensed data sources and state-of-the-art machine techniques offer new avenues for producing more accurate irrigation maps. This study presents the results from a monthly monitoring framework for fine-scale mapping of irrigated areas in the Limpopo River Basin. The proposed framework uses high to moderate-resolution earth observation data, the extra-tree classifier, and a series of land cover masks in differentiating rain-fed and irrigated areas. We found that the area of irrigated land during the dry season in 2021 varied from 356589 ha to 612738 ha between and September. The overall accuracy of classified maps varied from 98 to 100%.  The proposed framework offers an automatic and replicable cost-effective means of mapping irrigated areas using Google Earth engine, multisource data, and machine learning algorithms. 

How to cite: Kiala, Z., Matheswaran, K., Garcia, A. M., and Dickens, C.: Machine learning implementation for mapping irrigated areas at fine temporal and spatial resolutions in the Limpopo River Basin , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7620, https://doi.org/10.5194/egusphere-egu24-7620, 2024.

EGU24-8086 | ECS | Posters on site | ESSI2.10

Multiscale mapping and monitoring of green biodiversity in urban areas  

Chiara Bottaro and Giovanna Sona

Urban areas are being recognized as ecosystems, where biodiversity is essential for sustaining their functionality. Healthy ecosystems offer numerous services that contribute to the well-being of human populations. The EU Biodiversity Strategy and the Nature Restoration law call for no-net loss of urban green spaces and a minimum 10% tree canopy cover in European cities. This commitment is driving a surge in city tree planting projects, with expectations for further increase in the coming years.

In this context, we present an investigation focused on the multiscale mapping and monitoring of urban green biodiversity, with a primary emphasis on trees. Urban trees, besides providing various ecosystem services, play a crucial role in mitigating the urban heat load during summers, thereby alleviating adverse effects on human health, and reducing energy consumption. Unfortunately, the challenging conditions within urban environments, including increased temperatures and water scarcity due to impervious surfaces, can impact the phenology and physiology of trees, often compromising their health and functionality.

In order to better explore these aspects, we introduce an application based on Google Earth Engine with the aim to extract geospatial data related to tree cover and temperatures across various spatial scales in urban environments. Utilizing machine learning algorithms, the application downscales thermal infrared satellite imagery and classifies vegetation features. Users have the flexibility to investigate the relationship between temperatures and vegetation by selecting specific time windows and areas of interest, along with access to significant spectral indices and correlation coefficients. In a pilot case study on the city of Milan, we use the application to perform a detailed analysis at the tree species level, involving the assessment of individual tree canopy temperature response in different areas of the city.

This application aims to provide researchers, urban planners, green managers, and other professionals with a valuable tool to comprehend the spatial dynamics of vegetation in urban environments, assess the impacts of stressors on their fitness, and in the long run to evaluate the effectiveness of mitigation efforts, such as urban reforestations and tree planting.

How to cite: Bottaro, C. and Sona, G.: Multiscale mapping and monitoring of green biodiversity in urban areas , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8086, https://doi.org/10.5194/egusphere-egu24-8086, 2024.

EGU24-11283 | ECS | Posters on site | ESSI2.10

CataEx: How to get started with JavaScript in Google Earth Engine? 

Gisela Domej and Kacper Pluta

Google Earth Engine offers a multitude of options for retrieving and analyzing satellite imagery from various satellite missions, which optimizes GIS studies, especially in the aspect of download and storage of extensive data volumes.

However, the compatibility with traditional desktop or web-based GIS software remains a gap that was attempted to overcome with the JavaScript code CataEx – a multi-functional tool that exemplifies several essential types of computations and finally exports imagery and its additionally created layers as GeoTIFFs. We presented several basic functionalities of CataEx: identifying images out of collections (e.g., the least cloudy image in a time frame), the definition of a location of interest (as coordinates and/or as a polygon), cloud masking for different collections, evaluation of satellite band properties and their projections, layer creation in Google Earth Engine, index computation, pixel-based statistics and histogram plotting, layer visualization and export to Google Drive.

CataEx is available in six versions for Landsat 4/5/7, Landsat 8/9, and Sentinel-2, each separated for top-of-atmosphere reflectance and surface reflectance collection. The code is deliberately kept simple to allow for easy brick-like recombination, adaption, and customization of code sections, and, hence, can be used as an example toolkit for students or beginners writing their first JavaScript routines for Google Earth Engine.

This work is funded by the Polish National Science Center (no. 2021/42/E/ST10/00186); the code is available on Zenodo (https://doi.org/10.5281/zenodo.8407939). 

How to cite: Domej, G. and Pluta, K.: CataEx: How to get started with JavaScript in Google Earth Engine?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11283, https://doi.org/10.5194/egusphere-egu24-11283, 2024.

EGU24-11307 | Orals | ESSI2.10

Quantifying the transboundary environmental impacts of Earth observation in the 'big data' and constellation era 

Karen Anderson, Magdalena Mleczko, Robert Brewin, Kevin Gaston, Markus Mueller, Shutler Jamie, Xiaoyu Yan, and Ruby Wilkinson

Numbers of Earth Observation (EO) satellites have increased exponentially over the past decade, fuelled by a shift towards constellation models that promise to deliver data at finer spatial, temporal and spectral resolutions compared to the past. The result is the now >1000 EO satellites in orbit, a population that is rapidly increasing because of a booming private-sector interest in space imaging. Flowing from this, EO data volumes have mushroomed in recent years, and data processing has migrated to the cloud, with scientists leveraging tools such as Google Earth Engine for information retrieval. Whilst considerable attention has been given to the launch and in-orbit environmental impacts of satellites (e.g. rocket emissions and space-junk risks), specific environmental impacts from EO missions (data infrastructures and cloud computation); have so far escaped critical scrutiny. It is urgent that the environmental science community address this gap, so that the environmental good of EO can withstand scrutiny.

Data centres consume high quantities of water and energy and they may be situated in sensitive geographical situations far away from both users and launchpads (i.e. a transboundary environmental concern). There are also hidden impacts in the carbon-intensive processes of computer component manufacture, impacting places and communities far from the site of EO information retrieval. We scope the broad suite of transboundary environmental impacts that EO generates. Related to the data aspect of the EO life-cycle, we quantify the current volume of global EO data holdings (> 800 PB currently, increasing by 100 PB / year). Mapping the distribution of datasets across different data centre providers, our work shows high redundancy of datasets, with collections from NASA and ESA replicated across many data centres globally. Storage of this data volume generates annual CO2 equivalent emissions summing to >4000 tonnes/year. We quantify the environmental cost of performing EO functions on the cloud compared to desktop machines, using Google Earth Engine as an exemplar, scaling emissions using the ‘Earth Engine Compute Unit’. We show how large-scale analyses executed within GEE rapidly scale to produce the equivalent emissions of a single ticket on an economy flight ticket from London-Paris. Executing these processes on the cloud takes seconds, and these estimates do not account for emissions from microprocessor manufacture, nor do they account for users running processes multiple times (e.g. during code development).  A major blind-spot is that the geography of GEE data centres is hidden from users, with no choice given to users about where GEE processes are executed. It is important that EO providers become more transparent about the location-specific impacts of EO work, and provide tools for measuring the environmental cost of cloud computation. Furthermore, the EO community as one which is concerned with the fate of Earth’s environment must now urgently and critically consider the broad suite of EO data life-cycle impacts that lie (a) beyond the launchpad, and (b) on Earth rather than in space; taking action to minimise and mitigate them. This is important particularly because EO data will long outlive the satellites that provided them.

How to cite: Anderson, K., Mleczko, M., Brewin, R., Gaston, K., Mueller, M., Jamie, S., Yan, X., and Wilkinson, R.: Quantifying the transboundary environmental impacts of Earth observation in the 'big data' and constellation era, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11307, https://doi.org/10.5194/egusphere-egu24-11307, 2024.

EGU24-11472 | ECS | Orals | ESSI2.10

Geospatial suitability mapping for targeted vegetable production in fragile African regions 

Moctar Dembélé, Mansoor Leh, Darshana Wickramasinghe, Naga Velpuri, Karamoko Sanogo, Desalegne Tegegne, Mariangel Garcia Andarcia, and Petra Schmitter

Enabling the resilience of local food systems is crucial to ensure a steady supply of nutritious food to people living in fragile and conflict-affected locations. While the majority of interventions often focus on staple crops, there is an increasing tendency by humanitarian organizations to include vegetable production solutions in their programs. However, information on land suitability for vegetable production is usually lacking or available at a coarse spatial resolution, thereby limiting targeted interventions for smallholder farmers.

This study proposes a comprehensive geospatial data-driven framework for mapping suitable areas for vegetable production in Africa using a machine learning algorithm (ML) implemented in Google Earth Engine (GEE) and a Multi-Criteria Decision Analysis (MCA) approach. Mali (West Africa) and Ethiopia (East Africa) are selected as case studies given the current fragility of both countries, and support provided by the USAID's Bureau for Humanitarian Assistance (BHA). Field data of vegetable production locations was collected to train and validate the ML and MCA models. Several publicly available geospatial datasets, including FAO’s WaPOR database, were reviewed to select the predictor variables, which include information on climate, soil, topography, surface water, groundwater, socioeconomics and disaster risks. A suitability map was produced for all vegetables, and separate suitability maps were generated for the top five most cultivated vegetables in Mali and Ethiopia.

Comparison of the ML approach to the MCA approach revealed a lower performance of the former due to the limited availability of field data, thereby highlighting the benefit of expert knowledge in addition to the data-driven approach. The results show that the most suitable areas are found in the region of Segou in Mali (up to 88%), while the region of Oromia has the most suitable areas in Ethiopia (up to 85%). The resulting maps of land suitability for vegetable production serve to develop an irrigation investment targeting tool, which can be used to assist humanitarian organizations in implementing suitable irrigation solutions for vegetables.

How to cite: Dembélé, M., Leh, M., Wickramasinghe, D., Velpuri, N., Sanogo, K., Tegegne, D., Andarcia, M. G., and Schmitter, P.: Geospatial suitability mapping for targeted vegetable production in fragile African regions, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11472, https://doi.org/10.5194/egusphere-egu24-11472, 2024.

EGU24-11492 | ECS | Posters on site | ESSI2.10

Morphodynamics of the world’s large tropical rivers drive a carbon pumping mechanism   

Luca Salerno and Carlo Camporeale

This study investigates the eco-morphodynamic activity of major tropical rivers in the Tropics, aiming to quantify the carbon flux originating from riparian vegetation into inland waters. Employing a comprehensive multi-temporal analysis of satellite data, developed on the Google Earth Engine cloud computing platform, spanning from 2000 to 2019, at a spatial resolution of 30 meters, we focused on all tropical rivers with a width exceeding 200 meters. Our research reveals the existence of a highly efficient Carbon Pump mechanism, where river morphodynamics play a pivotal role in driving carbon export from the riparian zone. This, in turn, stimulates net primary production through a synergistic process involving floodplain rejuvenation and colonization.

The quantification of this unique pumping mechanism underscores its substantial contribution, alone accounting for an annual mobilization of 12 ± 0.96 million tons of carbon in these tropical rivers. We identified distinct signatures of fluvial eco-morphological activity that serve as proxies for assessing the carbon mobilization capability linked to river dynamics. The study also delves into the interplay between river migration and carbon mobilization, shedding light on the potential impacts on the carbon intensity of planned hydroelectric dams in the Neotropics.

We highlight the significance of the river-carbon nexus, emphasizing the necessity for a comprehensive approach in formulating effective water policies that consider the intricate relationships among river dynamics, carbon flux, and environmental phenomena.

 

How to cite: Salerno, L. and Camporeale, C.: Morphodynamics of the world’s large tropical rivers drive a carbon pumping mechanism  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11492, https://doi.org/10.5194/egusphere-egu24-11492, 2024.

The Qinghai-Tibetan Plateau (QTP) is one of the most sensitive and vulnerable regions under global climate change. Vegetation is the key component of the QTP ecosystem and is closely related to the ecological vulnerability. Fractional vegetation cover (FVC) is an important parameter to characterize vegetation conditions in the horizontal direction. Therefore, dense time-series of spatially continuous FVC at high spatial resolution is essential for understanding the detailed spatiotemporal dynamic changes in vegetation across QTP. Landsat is an ideal remote sensing data source for high spatial resolution FVC monitoring. However, frequent cloud cover during the growing season on QTP makes it challenging to observe FVC constantly only using Landsat. Spatiotemporal fusion methods integrating the advantages of Landsat and MODIS have been widely developed. Currently, most spatiotemporal fusion methods assume that the relationship between Landsat and MODIS is fixed over the prediction period. For regions with strong heterogeneity and large temporal variations, the relationship between Landsat and MODIS is variable along time. In addition, most methods fuse the reflectance bands separately without considering the interrelationship between bands. Therefore, a method blending Landsat and MODIS reflectance to generate FVC with 30m spatial resolution and 8-day interval based on Google Earth Engine (GEE) is proposed in this study. This method considers the dynamic relationship between MODIS and Landsat by analyzing the time-series data collected from multiple years. And a novel two-band simultaneous smoothing strategy is developed in this method, which can generate smoothed and consistent time-series of red and near-infrared bands simultaneously. Compared with three previous typical methods in two challenging QTP regions with rapid vegetation change, it can be found that the synthesized 30m reflectance data generated by the proposed method can get more accurate FVC. The validation results compared with the field-measured FVC further confirm the validity of the proposed method. The generated FVC products across QTP exhibit spatial continuity and reasonable time-series profiles. The proposed method is thus expected to provide high-quality FVC time-series with high spatiotemporal resolution over multiple years for QTP and other regions with frequent data missing based on GEE.

How to cite: Tao, G. and Jia, K.: Generating dense time-series of spatially continuous 30m fractional vegetation cover for the Qinghai-Tibetan Plateau based on Google Earth Engine, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13730, https://doi.org/10.5194/egusphere-egu24-13730, 2024.

The integration of Python scripting within the Google Earth Engine (GEE) code editor, enabled by the Open Earth Engine Extension (OEEex), introduces a practical and accessible approach to geospatial analysis. This development, powered by Pyodide —a Python runtime built for browser environments— allows users to write and execute Python code directly within the GEE interface without the need for local Python installations.

This feature caters especially to those more familiar with Python than JavaScript, providing a user-friendly platform for conducting Earth observation and analysis. It also facilitates the use of popular Python libraries such as Matplotlib for data visualization and scikit-learn for machine learning, directly within the GEE ecosystem.

The presentation will highlight some of the technical implementation, emphasizing how Python scripts can be executed in a browser environment without additional setup requirements. We will discuss the integration process, the challenges faced, and the solutions developed to ensure seamless functionality.

Demonstrations will showcase the capabilities of this integration, highlighting how Python can be utilized for various geospatial processing tasks in GEE. We aim to provide a realistic overview of the extension's capabilities, its impact on enhancing the flexibility of GEE, and its potential applications in remote sensing and Earth system science research.

How to cite: Gravey, M.:  Python Scripting in Google Earth Engine Code Editor with the Open Earth Engine extension, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15855, https://doi.org/10.5194/egusphere-egu24-15855, 2024.

The Cauvery Delta Zone (CDZ) in southern India, supported by the Cauvery River, is an essential agricultural region renowned for its high production and rich ecological variety. The CDZ, characterized by its dynamic interaction between land and water and its wide range of soil series, supports a highly productive cropping system referred to as the rice bowl of Tamilnadu. The vegetation in CDZ plays a vital role in maintaining ecological equilibrium. Gaining knowledge about the timing of its life cycle and tracking changes are crucial for evaluating the effects of climate change and human actions. The utilization of satellite technology, specifically Sentinel-2, presents unparalleled prospects for ecological research, offering comprehensive worldwide coverage and imagery with exceptional levels of detail.   The study combines geographical analysis with satellite-derived vegetation indices (VIs) to get insights into the agricultural dynamics of the region, specifically focusing on rice agriculture, pulse crops, and a variety of perennial crops. This study investigates the Vegetation Indices (VIs) derived from satellite data, specifically the Normalised Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Biomass Index (BI), Chlorophyll Index (CI), and Differential Vegetation Index (DVI). The analysis focuses on using Sentinel-2 data to examine the spatial and temporal patterns in the CDZ. The research highlights the significance of NDVI in doing qualitative vegetation analysis. Furthermore, the contributions of EVI, BI, and DVI in comprehending vegetation health and land cover changes are investigated on a monthly basis from June 2022 to May 2023. The Google Earth Engine platform is utilized for the procedure involves the acquisition of Sentinel-2 data, the elimination of clouds, pre-processing of the data, computation of various vegetative indices (VIs), analysis of the results, and exporting them. The outcome demonstrates the fluctuations of satellite-derived vegetation indices, with the peak values observed in September and the lowest values in November. The values of NDVI and DVI exhibit a strong positive association, whereas EVI and BI also have a strong positive correlation. Substantial fluctuations in the results are observed on a monthly basis. The findings enhance scientific progress and facilitate informed decision-making for sustainable development, by effectively balancing human activities and environmental conservation.

How to cite: Janardhanam, N. and Saravanan, S.: Monitoring Vegetation Dynamics in the Cauvery Delta Zone (CDZ) Using Satellite-Derived Vegetation Indices with Google Earth Engine, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17960, https://doi.org/10.5194/egusphere-egu24-17960, 2024.

EGU24-18012 | ECS | Orals | ESSI2.10

Enhancing Orchard Classification Accuracy through Object-Level Confidence Metrics: A Case Study in Automatic Orchard Identification in Valencia, Spain 

Héctor Izquierdo Sanz, Enrique Moltó garcía, and Sergio Morell Monzó

Effective agricultural management policies rely heavily on accurate crop identification across various scales. This is particularly challenging in highly fragmented agricultural landscapes, such as those found in many European regions, particularly for fruit tree orchard identification. While solutions exist, the confidence in individual orchard classification is often overlooked, despite its importance in enhancing classification accuracy (precision, recall and specificity).

Several confidence metrics at pixel level have been proposed by estimating the probabilities of a pixel to belong to each of the possible classes. The higher the probability of class membership for a given class, the greater the confidence associated with that class. In this sense, a measure of confidence can be based in the differences of probability between the first two highest values (sometimes called the distance to the second cluster).

This study introduces an innovative methodological approach to build a classification confidence metric at object (orchard) level. Once segmentation is completed, all pixels whose confidence is not above a certain threshold are masked out. Then, each orchard is initially assigned to a class by computing the mode of the unmasked pixels inside its perimeter. In a subsequent step, a confidence metric at orchard level is estimated, based on the number of mode class pixels, the total number of pixels completely inside the orchard, and the proportion of the mode pixels and unmasked pixels within the orchard. This confidence metric allows for a balance between increased precision and a reduction in the number of classified orchards (those with insufficient confidence in their classification).

The proposed method, fully implemented in Google Earth Engine, was tested in a highly fragmented area in Valencia (Spain). The system’s performance was assessed by using a Random Forest classification algorithm on Fourier coefficients of spectral indexes time-series at pixel-level plus a specific spatial cross-validation procedure. By setting a 70% orchard classification confidence level, the mean overall accuracy increased from 88.74 ± 3.03% to 93.58 ± 2.85%, and the Kappa index from 0.78 ± 0.06% to 0.87 ± 0.05%, albeit at the cost of leaving 12.60 ± 7.18 % of orchards unclassified.

How to cite: Izquierdo Sanz, H., Moltó garcía, E., and Morell Monzó, S.: Enhancing Orchard Classification Accuracy through Object-Level Confidence Metrics: A Case Study in Automatic Orchard Identification in Valencia, Spain, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18012, https://doi.org/10.5194/egusphere-egu24-18012, 2024.

EGU24-19017 | Orals | ESSI2.10

Comprehensive Hybrid Approach for LULC classification using GEE-API and ML 

Vinay Shivamurthy, Satyam Pawale, Yash Chandrashekar Shetty, and Sudarshan T Bhat
This research offers a new hybrid strategy for Land Use and Land Cover (LULC) classification that overcomes the constraints of k-Nearest Neighbors (KNN) through the use of Support Vector Machines (SVM). Our study, that makes use of the Google Earth Engine (GEE) API in conjunction with Colab, focuses on custom preprocessing that enhances data richness and context. We perform custom preprocessing, including feature scaling and data fusion of spatial, temporal, spectral, and radiometric dimensions, in order to enhance the data obtained from satellite imagery, incorporating spatial and temporal materials made of composites spectral fusion, and radiometric calibration. The approach we employ uses Rasterio for satellite images and Shapely for vector data. Geopandas encourages the smooth management of data related to geography implementing the GeoJSON format, strengthening compatibility with the Google Earth Engine, whereas Digital Elevation Models (DEMs) and Landgren software enrich LULC analysis.
The hybrid approach eliminates k-Nearest Neighbours (KNN) inefficiencies through the incorporation of Support Vector Machines (SVMs). The drawbacks of KNN, including computational intensity, sensitivity to irrelevant features, susceptibility to noise, and the need for optimal hyperparameter selection, are mitigated by leveraging SVM's strengths. SVM, which has been appreciated for its information technology efficiency, ability to withstand noise and outliers, and relevance-driven decision boundary learning, is a successful complement to KNN. The combination approach encompasses pre-processing with SVM in order to enhance data quality, learning the decision boundary with SVM, and selectively applying KNN in localized regions of interest. The perpetual enhancement of the hybrid model via validation enables a balanced use of SVM's robustness and KNN's flexibility. The proposed hybrid technique is an intriguing option that could enhance the efficiency and performance of LULC classification tasks, catering to the specific characteristics of the dataset and analysis goals.
Libraries for Python (Folium, Matplotlib, Seaborn) enable integration, allowing users to produce distinctive visualizations adapted to the specifications of remote sensing products designed for specific applications. Folium is used for producing interactive geographical maps, Matplotlib delivers configurable static plots, and Seaborn focuses on statistical data visualization. This combination facilitates complete investigation of complicated satellite picture collections using a variety of viewing approaches.
Overall, this hybrid methods, aided by improved preprocessing, data fusion, and visualization tools, presents a promising strategy for improving the efficiency and effectiveness of LULC classification while adapting to particular characteristics of the dataset and able to analyze objectives.
 
Keywords: LULC, Hybrid classification, SVM, KNN, Data fusion, Geospatial analysis, visualization

How to cite: Shivamurthy, V., Pawale, S., Chandrashekar Shetty, Y., and Bhat, S. T.: Comprehensive Hybrid Approach for LULC classification using GEE-API and ML, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19017, https://doi.org/10.5194/egusphere-egu24-19017, 2024.

EGU24-19202 | Posters on site | ESSI2.10

Multi-temporal Remote Sensing Data Analysis for Devastating Flood Study in Northern Libya 

Roman Shults and Ashraf Farahat

Floods are considered the most dangerous and destructive geohazards, leading to human victims and severe economic outcomes. Yearly, a large number of regions around the world suffer from devasting floods. The estimation of flood aftermaths is one of the high priorities for the global community. One such flood took place in northern Libya in September 2023. The presented study is aimed at the Libyan flood aftermath evaluation using Google Earth Engine opportunities. The primary task is obtaining and analyzing data that provide high accuracy and detail for the study region. Among those data that are of importance for flood and its aftermath assessment are remote sensing images. The last decades have seen an increased interest in remote sensing data thanks to the data accessibility and variety of free, open-source computational platforms. The data from Landsat, Sentinel, and similar missions are ubiquitous and well-studied. On the other hand, such software as Google Earth Engine or QGIS has a powerful toolbox for different solutions. The goal stated in the paper is related to image classification and change detection problems. The mentioned software provides various solutions based on machine-learning approaches for image classification and change detection. Miscellaneous data have been used to reach the paper’s goal. The first stage of the study was the development of a workflow for data analysis. This workflow includes three parallel processes of data analysis. The study comprised Sentinel 2 data for image classification using multispectral bands. Different supervised classification methods were examined, including random forest, support vector machines, naïve-Bayes, and CART. The different sets of hyperparameters for classification were considered. GEOEYE-1 and WorldView-2 images of four cities, Dernah, Susah, Al-Bayda, and Brega, were investigated for change detection algorithms. In addition, different NDVIs were calculated to facilitate the recognition of damaged regions. At the final stage, the analysis results were fused using the QGIS platform to obtain the damage estimation for the studied regions. As the main output, the area changes for the primary classes, and the maps that portray these changes were obtained. The recommendations for data usage and further processing in Google Earth Engine were developed.

How to cite: Shults, R. and Farahat, A.: Multi-temporal Remote Sensing Data Analysis for Devastating Flood Study in Northern Libya, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19202, https://doi.org/10.5194/egusphere-egu24-19202, 2024.

EGU24-19440 | ECS | Orals | ESSI2.10

The application of Google Earth Engine to monitor sugarcane development in India. 

Neha Joshi, Daniel Simms, and Paul Burgess

The Google Earth Engine platform has transformed access to long term scientific datasets as more than thirty years of image data are readily available for analysis on the cloud. Despite this rapid access to long term data and cloud computing resources of GEE, there is still the requirement to remove noise (e.g., from cloud and haze) and correct for the effects of calibration between image types to process raw data into information about changes in the Earth’s surface. Using the Python Earth Engine API, we developed an algorithm to combine timeseries datasets from high spatial resolution sensors (Landsat-8 and Sentinel-2) and filter noise whilst still retaining high temporal resolution. A second algorithm was then developed to automate the decomposition of pre-processed timeseries data into individual agricultural seasons for the extraction phenological stages for sugarcane fields across India. Our approach was developed and validated on over 800 sugar cane field parcels and overcomes the challenge of previous machine-learning methods for sugarcane monitoring using remote sensing that rely on information on planting and harvesting. Fully automated monitoring of sugarcane is possible over wide areas without the need to download image datasets or process time series data locally. This approach can significantly improve the sustainability of sugarcane production by optimising the harvest to maintain efficiency in the supply of sugarcane to mills during the crushing season and reduce waste by avoiding the harvest of immature cane. The use of GEE means that this approach can be easily modified for use with other crops and in other geographical areas to improve satellite-based monitoring of crops. These tools are essential for realising the goals of sustainable development, and time-series analysis can be used to help producers demonstrate commodities have not come from recently deforested land (for example, the EU regulation on deforestation-free products).

How to cite: Joshi, N., Simms, D., and Burgess, P.: The application of Google Earth Engine to monitor sugarcane development in India., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19440, https://doi.org/10.5194/egusphere-egu24-19440, 2024.

EGU24-20371 | Orals | ESSI2.10

Two powerful Google Earth Engine (GEE) Apps for the worldwide high-temperature features monitoring and investigation 

Mariapia Faruolo, Nicola Pergola, Nicola Genzano, and Francesco Marchese

Two innovative and powerful Google Earth Engine (GEE) Apps have recently been developed to identify and map volcanic thermal features and investigate gas flaring sources at global scale in daylight conditions. Both the GEE Apps are based on the Normalized Hotspot Indices (NHI; Marchese et al., 2019), which analyze the Near Infrared (NIR) and Short-Wave Infrared (SWIR) radiances from the Multispectral Instrument (MSI) and the Operational Land Imager (OLI/OLI-2) sensors, respectively aboard the Sentinel-2 and Landsat 8/9 satellites, to detect high-temperature features. The NHI tool enables the analysis of volcanic thermal anomalies through plots of hotspot pixel number, total SWIR radiance and total hotspot area. In addition, an automated module of the tool notifies the active volcanoes over the past 48 hours (https://sites.google.com/view/nhi-tool/home-page). DAFI (Daytime Approach for gas Flaring Investigation) by performing a multitemporal analysis of the NHI identifies the gas flares on annual basis both onshore and offshore, providing information about the gas flaring sources in terms of persistence of thermal activity and through the computation of the radiative power (https://sites.google.com/view/flaringsitesinventory). These systems demonstrate the relevance of the GEE platform in supporting the analysis, monitoring and characterization of hot targets (both natural and industrial ones) thanks to the massive computational resources and the availability of extended datasets of multisource satellite observations.

How to cite: Faruolo, M., Pergola, N., Genzano, N., and Marchese, F.: Two powerful Google Earth Engine (GEE) Apps for the worldwide high-temperature features monitoring and investigation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20371, https://doi.org/10.5194/egusphere-egu24-20371, 2024.

This session delves into the evolving landscape of climate regulations and voluntary commitments facing supply chains, with a specific focus on land-based supply chains such as agriculture and forestry. Recognizing the pivotal role of the first mile in contributing over 50% of greenhouse gas emissions and environmental externalities, the discussion centers on addressing the challenges posed by scope 3 emissions and the distributed responsibility inherent in the supply chain.

Epoch introduces the SCo2-API, a comprehensive web application leveraging Google Earth Engine and Google Cloud. This platform facilitates the management of geospatial assets within the supply chain, encompassing plot data, agricultural practices, supply shed, and payment details. The SCo2-API further provides API endpoints to derive on-demand, spatio-temporally relevant sustainability metrics.

The presented metrics include deforestation monitoring for compliance with European Union Deforestation Regulation (EUDR), Land Use Change (LUC), and Land Management (non-LUC) emissions estimates. These metrics serve to identify intervention hotspots and monitor environmental co-benefits such as carbon removals, water use, and biodiversity resulting from landscape restoration interventions.

Additionally, the SCo2-API incorporates advanced capabilities such as automated sampling design and minimum sampling density requirements for field data collection. These features are crucial for validating and enhancing confidence in the sustainability metrics generated. The framework also integrates payment capabilities to support Payments for Ecosystem Services (PES) schemes based on validated sustainability metrics.

Algorithmically, the SCo2-API ensures near-real-time results through a suite of scientific workflows. These include time series change detection using the Continuous Change Detection and Classification (CCDC) algorithm (Zhu et al., 2014), machine learning models predicting above-ground biomass and canopy heights using GEDI and ICESat data, canopy height heterogeneity as a proxy for landscape diversity (Rocchini et al., 2018), and evapotranspiration modeling as a proxy for water use (Melton et al., 2022). All scientific workflows are based on open datasets and satellite collections, aligning with open-source principles to ensure reproducibility and auditability of generated figures.

This session aims to explore the scientific and technological dimensions of the SCo2-API framework, providing insights into its applications for advancing supply chain sustainability and meeting regulatory and voluntary commitments.

How to cite: Ouellette, W. and Surti, J.: Geospatial Analytics for Enhanced Supply Chain Sustainability: Introducing the SCo2-API Framework, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21280, https://doi.org/10.5194/egusphere-egu24-21280, 2024.

EGU24-628 | ECS | Posters on site | ESSI2.11

AGEMERA AI: Innovative AI solution for responsible resource exploration 

Barbara Stimac Tumara and Taras Matselyukh

In the context of the AGEMERA project (European Union's Horizon Europe research and innovation programme under grant agreement No 101058178), OPT/NET introduced the AGEMERA Platform as an extension to its flagship product, MONITORED AITM. AGEMERA's core objectives include unlocking the EU's resource potential, enhancing public awareness of critical raw materials, and fostering environmentally and socially responsible mineral exploration.

OPT/NET solutions deliver exceptional user experience through an intuitive AI-driven interface, streamlining the complex processes of remote sensing data acquisition and data fusion to boost productivity by merging AI processing and automation with human problem-solving skills. The AI engine optimally harnesses heterogeneous Earth Observation data (predominantly  Copernicus Programme) and analysis-ready data from multiple heterogeneous and multimodal sources, promoting responsible mineral exploration and addressing environmental concerns, resulting in an interactive user-friendly web platform.

The platform is structured around three main modules:

  • OCLI 

Serving as the backend, OCLI manages data mining processes and automates functions, such as data acquisition, preprocessing, image processing, thorough workflows known as AI Knowledge Packs (AIKPs). OCLI collects relevant satellite data products, prepares analysis-ready data (ARD) cubes, and acts as the foundation for multi-source data fusion.

  • AGEMERA Geo-Suite Graphical User Interface 

As the frontend module, this web-based platform offers a general data repository and visualization in Cloud Optimized data formats. Powered by OGC standards compliant web services, it enables interactive exploration of extensive datasets for the public and mine site operators and owners. The conversational AI agent assists in easy navigation through the extensive collection of the data and insights in the platform, while the data itself is secured with various authentication and authorization access levels.

  • Data Repository

Comprised of bulk data warehouse enriched with INSPIRE nomenclature compliant metadata, the Data Repository employs Cloud Object Storage Service as the primary storage solution. This service provides web-based access to the object storage, ensuring efficient data access from any web browser.

OPT/NET's AI platform, exemplified in the AGEMERA project, stands as a pioneering solution seamlessly integrating AI processing, heterogeneous data fusion, human problem-solving, and advanced data management for generation of impactful insights and responsible resource exploration.

How to cite: Stimac Tumara, B. and Matselyukh, T.: AGEMERA AI: Innovative AI solution for responsible resource exploration, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-628, https://doi.org/10.5194/egusphere-egu24-628, 2024.

Horizon project OCEANIDS aims at building user-driven applications and tools, which act as an enabling technological layer for regional authorities & stakeholders in order to achieve a more resilient and inclusive systemic pathway to a Blue Economy in coastal regions. Under a single-access window platform for Climate- Informed Maritime Spatial Planning, the project will allow a more integrated seascape management of coastal regions. The overarching concept is to collect, harmonize and curate existing climate data services, making data accessible, reusable and interoperable for the development of local adaptation strategies. On top of the Euro Data Cube1 and the Copernicus Coastal Hub2 the OCEANIDS data cubes will be fused with EO data and further processed and analyzed using advanced deep learning and AI methods and techniques. The purpose of these deep learning and AI methods software tools is to extract meaningful information from the high dimensional datasets and to efficiently structure the collected data into a semantically enriched framework through supervised and unsupervised learning. Generative adversarial neural networks (GANs) are exploited to fill data inconsistencies and incompleteness where existing, so that OCEANIDS are well structured and consistent and therefore optimization of processing, analysis and classification performance is achieved.

A holistic hazard & risk assessment platform will be elaborated, assimilating data from all available sources in OCEANIDS, including asset exposure datasets, population statistics, long-term hazard simulations, short-term hazard forecasting, vulnerability information (both historical and simulation based) as well as impact assessment data from past and forecasted events. The modelling framework for assessing the magnitude of impacts will ensure accurate propagation of aleatory and epistemic uncertainties from all pertinent sources, e.g., data, methods, models, and parameters, all the way to the final quantification of risk. The applied modelling and simulation tools will estimate the state of assets, either single or in portfolios, depending on their currently reported state and/or the states of interconnected assets, where available. The state of an interconnected asset is thus a result of the nature of the hazard pressure affecting the originating asset, the characteristics of the asset under consideration (risk mitigation, means of immediate response, safety equipment) and the type of interconnection between the assets. This approach is the basis for accurately quantifying the risk over a region, allowing the improvement and optimization of the safety of the complex infrastructures related to their operation processes and their inside and outside interactions, while offering actionable metrics for regional planning, insurance, and natural catastrophe prevention/mitigation.

OCEANIDS Decision Support Platform will be implemented to give reliable recommendations to the end-users regarding spatiotemporal changes and the impact of climate change on the environment. The Digital Twin Earth model will be used and be able to monitor Climate Change, perform simulations of Earth’s interconnected system and human behavior, and help support environmental policies. This high-level front-end platform will be providing the end-users with an assessment of hazards in their respective region and leading to subsequent multi-scale planning.

1 https://eurodatacube.com/

2 https://www.coastal.hub.copernicus.eu/

How to cite: Charalampopoulou, V., Marinou, E., Kontopoulos, C., and Kolokoussis, P.: OCEANIDS: Utilizing Copernicus Satellite Imagery Open Data Infrastructures for User-driven applications and Tools for Climate-Informed Maritime Spatial Planning and Integrated Seascape Management, towards a Resilient & Inclusive Blue Economy., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2538, https://doi.org/10.5194/egusphere-egu24-2538, 2024.

EGU24-4848 | ECS | Posters on site | ESSI2.11

Global Civil Earth Observation Satellite Semantic Database 

Ming Lin, Meng Jin, Juanzi Li, and Yuqi Bai

The advancement of satellite remote sensing technology has transformed our capacity to monitor and address global challenges. This technology provides global coverage, frequent observation revisits, and consistent monitoring, thus providing critical data support. Since the first Earth observation satellite was launched in the 1960s, more than a thousand Earth observation satellites have been deployed by various countries and organizations. However, the substantial accumulation of Earth observation assets is maintained independently by different organizations using varying methodologies. This poses a significant challenge in effectively utilizing and maximizing the value of these global observation resources.

This study introduces GEOSatDB, a comprehensive semantic database specifically tailored for civil Earth observation satellites. The foundation of the database is an ontology model conforming to standards set by the International Organization for Standardization (ISO) and the World Wide Web Consortium (W3C). This conformity enables data integration and promotes the reuse of accumulated knowledge. Our approach advocates a novel method for integrating Earth observation satellite information from diverse sources. It notably incorporates a structured prompt strategy utilizing a large language model to derive detailed sensor information from vast volumes of unstructured text.

To demonstrate the capabilities of GEOSatDB, we performed a comprehensive analysis of the distribution of Earth observation satellites in 195 countries. This analysis unveiled the global distribution and diversity of these assets. Furthermore, two distinct case studies showcase the practical application and robust data mining potential of GEOSatDB. With information on 2,340 remote sensing satellites and 1,018 sensors, this database represents a significant advancement in semantically sharing and applying Earth observation resources. Its establishment encourages enhanced international cooperation and more efficient environmental monitoring and management.

How to cite: Lin, M., Jin, M., Li, J., and Bai, Y.: Global Civil Earth Observation Satellite Semantic Database, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4848, https://doi.org/10.5194/egusphere-egu24-4848, 2024.

EGU24-4942 | Orals | ESSI2.11 | Highlight

Copernicus Data Space Ecosystem - Platform That Enables Federated Earth Observation Services and Applications 

Grega Milcinski, Jan Musial, Jacek Leszczenski, Dennis Clarijs, and Jurry de la Mar

The Copernicus Data Space Ecosystem (CDSE), a pivotal initiative of the European Space Agency (ESA) and funded by the European Commission, represents a groundbreaking advancement in Earth Observation (EO). It is an integral part of the ambitious Copernicus Programme, which has revolutionized EO data access and utilization across various sectors. The CDSE is more than a mere data repository; it is a collaborative platform where knowledge, expertise, and resources are shared and enhanced. This ecosystem approach is transforming the EO landscape, encouraging a more inclusive and participatory model where researchers, policymakers, businesses, and citizen scientists contribute to and benefit from the wealth of EO data. The CDSE stands as a beacon of international cooperation and a driver for sustainable development, setting new standards for leveraging EO data in addressing some of the most pressing global challenges.

Central to the CDSE is its commitment to federation and cooperation, offering a flexible framework for integration and participation to data providers, remote sensing experts, and application developers. This approach fosters innovation and collaboration, ensuring that the ecosystem benefits from a diverse range of contributions. Data providers can upload their data to the ecosystem’s object storage, making it available to CDSE users under free or commercial conditions. This not only enhances the CDSE’s data repository but also expands the providers' reach. Remote sensing experts contribute their algorithms and workflows to the openEO Algorithm Plaza, a collaborative space for sharing and enhancing EO technologies and applications. Application developers can utilize CDSE's infrastructure and data to create solutions for specific needs like agricultural monitoring, demonstrating the ecosystem’s potential for application development.

The CDSE's principle of federation extends beyond individual contributors to collaborations with other platform providers, enhancing the range of services and capabilities available to users. These partnerships are crucial for the ecosystem's growth, ensuring access to the best tools and data. The governance of the Data Space Ecosystem incorporates several third-party commercial providers and is designed to grow with additional contributions. It is based on European values of data sovereignty and privacy, ensuring trust, collaboration, and sustainability. The ecosystem maintains a clear distinction between publicly funded services and third-party offerings, both free and commercial. This open and federated approach makes CDSE a comprehensive solution for EO needs and solidifies its position as a leader in the EO community.

How to cite: Milcinski, G., Musial, J., Leszczenski, J., Clarijs, D., and de la Mar, J.: Copernicus Data Space Ecosystem - Platform That Enables Federated Earth Observation Services and Applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4942, https://doi.org/10.5194/egusphere-egu24-4942, 2024.

EGU24-9662 | ECS | Orals | ESSI2.11

Urban Data Space to Support Green Deal Priority Actions 

Raniero Beber, Piergiorgio Cipriano, Oscar Corcho, Michel Gabriel, Giacomo Martirano, Francesca Noardo, Daniela Poli, Fabio Remondino, and Danny Vandenbroucke

Data spaces are a new data sharing paradigm through a distributed ecosystem, allowing data owners to maintain control over their data use, unlocking the data potential against Findability Accessibility Interoperability and Reusability (FAIR) principles. For urban areas, data spaces provide geospatial, environmental and climate data to everyone. Various international and interdisciplinary projects and initiatives are developing data spaces, allowing policy-makers, researchers, citizens and the private sector to access high-quality, interoperable data in order to generate actionable knowledge, support the green transition of cities as well as enable a more effective decision-making process. 

The Urban Data Space for Green Deal - USAGE - EU project aims to offer innovative governance mechanisms, consolidated frameworks, best practices, AI-based tools and data analytics to share, access and use city-level data from satellite and airborne platforms, Internet of Things (IoT) sensors, authoritative and crowdsource sources. 

Within USAGE, a series of geospatial, thematic and other datasets have been newly acquired or generated over four pilot areas and will be shared through a standard-based web ecosystem to test and evaluate solutions (i) to better understand issues and trends on how our planet and its climate are changing; (ii) to support decision making intended to mitigate the effects of changes and environmental issues on life; and (iii) to address the role that humans play in these changes, e.g., with behaviour adaptation and mitigation actions.

Urban areas are the focus of USAGE since most of human activities are concentrated there, being the main source of some of the issues considered as green deal priorities (e.g. energy use, pollution, climate changes effects); solutions in USAGE are developed by an interdisciplinary team analysing geospatial data and by meeting multiple and diverse local requirements.

In this work we will present the relevant datasets collected in our pilot areas, reporting processing methodologies and applications of analysis-ready and decision-ready geospatial data. In particular we will report experiences related to:

- detection of urban heat islands (UHI) and production of UHI maps, utilizing open data like high-resolution satellite imagery, meteorological ground sensor data, surface properties and a hybrid model based on machine learning and geostatistics;

- generation of semantic 3D city models for photovoltaic solar potential estimation and energy efficiency purposes;

- generation of high-resolution thematic maps (surface materials, ecological indexes, etc.) from hyperspectral airborne imagery using a multi-level machine learning approach and supported by training data provided by the municipalities;

- realization of canopy thematic layers combining 3D point clouds and hyperspectal images to monitor health and growth of trees over time, to estimate biomass and to map species distribution;

- initiation of multi-temporal (night/days, summer/winter) thermal analyses based on high-resolution aerial thermal images, deriving proper land surface temperatures (LST) by correcting raw sensor data with thematic maps of surface materials. 

The presentation will highlight the importance of shared urban data spaces to enable visualization, sharing, fusion and processing of environmental and Earth Observation data from heterogeneous sources, ultimately facilitating more effective decision-making processes, besides advances in scientific research.

How to cite: Beber, R., Cipriano, P., Corcho, O., Gabriel, M., Martirano, G., Noardo, F., Poli, D., Remondino, F., and Vandenbroucke, D.: Urban Data Space to Support Green Deal Priority Actions, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9662, https://doi.org/10.5194/egusphere-egu24-9662, 2024.

EGU24-9921 | Orals | ESSI2.11

Towards a GEOSS Digital Ecosystem: the GEOSS Platform Plus Vision 

Mattia Santoro, Alessandro Scremin, Gregory Giuliani, Joost van Bemmelen, Eliana Li Santi, Daniele Giordani, Małgorzata Herman, Aleksandra Marczuk, and Piotr Krupa

The GEOSS Platform Plus (GPP) project, funded by the European Union’s Horizon 2020 Framework Programme, aims to contribute to the implementation of the Global Earth Observation System of Systems (GEOSS) by evolving the European GEOSS Platform components to allow access to tailor-made information and actionable knowledge.

In this context, the GPP project is defining a vision for evolving GEOSS based on the concept of Digital Ecosystem (DE). Designing GEOSS as a Digital Ecosystem (DE) requires, first, the identification of its ecosystem service(s). As advocated in several documents, including the “Concept Paper for the Next Phase of the GEOSS Infrastructure” released by GEOSS Infrastructure Development Task Team (GIDTT) and concept of Earth Intelligence as introduced in the GEO Post 2025 Strategy, GEOSS should shift from the traditional data sharing paradigm to a model where knowledge generation and sharing is facilitated.

Therefore, we can identify, but not limiting to, the GEOSS Ecosystem Service as the Provisioning of trusted EO-based knowledge for decision-making. Considering the GEO mission and users basin, if such an ecosystem service is provided by GEOSS, a virtuous cycle can be initiated. End-users will utilize tools and services of the GEOSS DE because they find what they need and can use the desired information to obtain actionable knowledge; intermediate users (e.g., developers, scientists, data intermediaries, etc.) will build new tools on top of (and contributing to) the DE to attract new end users and contribute to the GEO communities benefits; providers will have more interest in participating (belonging) to the DE to spread the use of their resources and enlarge their users basin.

Such a framework enables a more flexible environment, which facilitates (and encourages) the extension of GEOSS capabilities through new components, and strengthen the involvement and contribution of GEO communities at large. To demonstrate this concept with practical examples, in this presentation we will introduce some of the use cases developed by the GPP project: (i) the SDG 15.3.1 for the Land Degradation use case, and (ii) the Nutrient Pollution in European Inland and Coastal Waters use case. Such use cases use (and extend) the current GEOSS Platform for the generation of actionable knowledge (and beyond, towards Earth Intelligence).

How to cite: Santoro, M., Scremin, A., Giuliani, G., van Bemmelen, J., Li Santi, E., Giordani, D., Herman, M., Marczuk, A., and Krupa, P.: Towards a GEOSS Digital Ecosystem: the GEOSS Platform Plus Vision, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9921, https://doi.org/10.5194/egusphere-egu24-9921, 2024.

EGU24-10109 | Posters on site | ESSI2.11

Overview of the Copernicus Data Space Ecosystem APIs 

Jan Musial, Jacek Leszczenski, Jedrzej Bojanowski, Grega Milcinski, Anja Vrecko, Dennis Clarijs, Jeroen Dries, and Uwe Marquard

The Copernicus Data Space Ecosystem introduces an innovative approach to accessing large Earth Observation (EO) data through streaming services and APIs. This study elaborates on advantages of the novel data discovery, streaming and access APIs within the Data Space, emphasizing key components such as: STAC catalog, OData and OpenSearch querying APIs, Browser, S3 interface, Jupyter Hub, openEO, and Sentinel Hub’s APIs.


The OData (Open Data Protocol) stands as the ISO/IEC approved standard, outlining best practices for designing and utilizing REST APIs. It harmonizes various elements such as request and response headers, status codes, HTTP methods, URL conventions, media types, payload formats, and query options. The OpenSearch catalogue enables the search for Copernicus data through a standardized web service. For technical details, the OpenSearch specification serves as a reference. The results from this web service are delivered as GeoJSON feature collections, with each feature representing an Earth observation 'product' and referencing the location of the actual data.


STAC (Spatio-Temporal Asset Catalog) emerges as a cutting-edge EO data management standard with robust data discovery and access capabilities. Built on simple JSON entries with resource-pointing URLs, it boasts flexibility in the types of data stored and accommodates new extensions and capabilities seamlessly. The hierarchical structure includes collections (e.g., Sentinel-2 L2A), subdivided into items representing satellite products. These items further break down into assets corresponding to spectral bands. This detailed STAC catalogue facilitates user searches for specific spectral bands and enables streaming of spatial data subsets using HTTP range requests.


JupyterHub is a central platform for managing Jupyter Notebooks, simplifying tasks like resource allocation and user authentication. Copernicus Data Space Ecosystem users get free access, but with limited resources. Jupyter Notebooks offer a user-friendly interface for EO data prototyping, integrating with Python SDKs like Sentinel Hub and OpenEO. They grant direct access to the EO data repository, eliminating dependency hassles with pre-configured kernels for immediate prototyping. 


Sentinel Hub, a satellite imagery processing service, that excels in on-the-fly actions like gridding, re-projection, mosaicking, etc. It efficiently fetches data for web applications or analytical processes like ML, based on satellite data without replication or pre-processing. Several APIs, including OGC services, contribute to the Data Space Browser for advanced data visualization. Key APIs within the Copernicus Data Space Ecosystem include: BYOC (Bring Your Own COG) API, (Batch) Processing API, (Batch) Statistical API. These APIs empower users to perform diverse tasks, from data ingestion to advanced statistical analyses.

OpenEO, revolutionizes geospatial data processing and analysis by offering a unified, interoperable platform for developers, researchers, and scientists. This framework empowers users to address complex geospatial challenges collaboratively, leveraging distributed computing environments and cloud-based resources. The collaborative nature of openEO enables seamless sharing of code, workflows, and data processing methods across platforms, advancing the accessibility, scalability, and reproducibility of Earth Observation (EO) data. OpenEO's intuitive libraries in Python, R, and JavaScript simplify the analysis of diverse datasets. In addition to the SDK, openEO features the Web Editor - an online tool for visually constructing workflows.

How to cite: Musial, J., Leszczenski, J., Bojanowski, J., Milcinski, G., Vrecko, A., Clarijs, D., Dries, J., and Marquard, U.: Overview of the Copernicus Data Space Ecosystem APIs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10109, https://doi.org/10.5194/egusphere-egu24-10109, 2024.

EGU24-17472 | ECS | Posters on site | ESSI2.11

A Data Space environment for big data and ML-based climate applications in the European Open Science Cloud 

Donatello Elia, Fabrizio Antonio, Sandro Fiore, Emanuele Donno, Gabriele Accarino, Paola Nassisi, and Giovanni Aloisio

Several scientific fields, including climate science, have undergone radical changes in the last years due to the increase in the data volumes and the emergence of data science and Machine Learning (ML) approaches. In this context, providing fast data access and analytics has become of paramount importance. The data space concept has emerged to address some of the key challenges and support scientific communities  towards a more sustainable and FAIR use of data.

The ENES Data Space (EDS) represents a domain-specific example of this concept for the climate community developed under the umbrella of the European Open Science Cloud (EOSC) initiative by the European Commission. EDS provides an open, scalable, cloud-enabled data science environment for climate data analysis on top of the EOSC Compute Platform made available through a user-friendly JupyterLab GUI. The service integrates into a single environment climate datasets from well-known initiatives (e.g., CMIP6), compute resources, data science and ML tools. It was launched in the context of the EGI-ACE project, it is accessible through the EOSC Catalogue and Marketplace (https://marketplace.eosc-portal.eu/services/enes-data-space) and it also provides a web portal (https://enesdataspace.vm.fedcloud.eu) including information, tutorials and training materials. It has recently been added to the Data Spaces radar, an initiative launched by the IDSA (International Data Space Association) with the main goal of mapping data spaces from several domains into one easy-to-use tool.

The EDS is being employed in climate applications targeting big data processing, interactive analytics and visualization, and recently it has been extended to support more advanced scientific applications based on ML. In particular, it has been enhanced to provide a cloud-based development and testing environment for the implementation of some data-driven Digital Twin applications for extreme climate events in the context of the interTwin project. Finally, the ENES Data Space will be also one of the pilots in EOSC Beyond, a Horizon Europe project where the EDS will integrate and validate the new EOSC Core capabilities developed by the project.

This work has been supported in part by interTwin; interTwin is funded by the European Union (Horizon Europe) under grant agreement No 101058386.

How to cite: Elia, D., Antonio, F., Fiore, S., Donno, E., Accarino, G., Nassisi, P., and Aloisio, G.: A Data Space environment for big data and ML-based climate applications in the European Open Science Cloud, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17472, https://doi.org/10.5194/egusphere-egu24-17472, 2024.

EGU24-18047 | Posters on site | ESSI2.11

OGC Web APIs to Make Data Available in the Green Deal Data Space 

Joan Masó, Alba Brobia, Malte Zamzov, Ivette Serral, Thomas Hodson, Raul Palma, Lucy Bastin, and Victoria Lush

The Green Deal Data Space is born in the big data paradigm where sensors produce constant streams of Earth observation data (remote sensing or in-situ). The traditional and manual organization of data in layers is no longer efficient as data is constantly evolving and mixed together in new ways. There is a need for a new organization of the data that favors users and in particular can be considered ready for use. The AD4GD project proposes a threefold solution: A new information model, dynamic multidimensional datacubes and OGC APIs for data query and filtering. The use of data in the context of the Green Deal is difficult due to the wide heterogeneity of data sources (many times expressed in different data models) and different semantics used to represent data, and the lack of sufficient interoperability mechanisms that enable the connection of existing data models. The solution is a framework composed by a suite of ontologies implemented in line with best practices, reusing existing standards and well-scoped models as much as possible and establishing alignments between them to enable their interoperability and the integration of existing data. This approach was successfully applied to object based data stored in a RDF triple store e.g. vector data and sensor data) in the agricultural sector. The traditional representation of static two-dimensional layers needs to be replaced by a dynamic view where time becomes an extra dimension. In addition, other dimensions emerge such as the height in atmospheric and oceanic models, the frequency in hyperspectral remote sensing data or the species in a biodiversity distribution model. These dimensions define a data cube that offers an entry to a constantly evolving world. The dimensions in the datacube as well as the values stored in the cube cells (a.k.a. attributes) should be connected to the concepts in the Information Model. In the big data paradigm access to individual layers is difficult due to the dynamic nature of the data. Instead we need a set of modern APIs as an entry point to the data in the data space. The OGC APIs offer a set of building blocks that can be combined together with other web API design principles to build the Green Deal Data Space data access APIs. Users will be able to get the necessary data for modeling and simulating the reality using the OGC APIs endpoints described in the OpenAPI description document. The OGC APIs include the concept of collections as an initial filtering mechanism but a filtering extension using CQL2 is in advanced draft status. It is fundamental that the OGC APIs are also capable of generating detailed metadata about the extracted subset including data sources, producer information and data quality estimations.

AD4GD is a Horizon Europe project co-funded by the European Union, Switzerland and the United Kingdom.

How to cite: Masó, J., Brobia, A., Zamzov, M., Serral, I., Hodson, T., Palma, R., Bastin, L., and Lush, V.: OGC Web APIs to Make Data Available in the Green Deal Data Space, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18047, https://doi.org/10.5194/egusphere-egu24-18047, 2024.

Integrity of natural ecosystems, including terrestrial ones, and their connectivity is one of the main concerns of current European and Global Green Policies, e.g., the European Green Deal. Thus, public administration managers need reliable and long-term information for a better monitoring of the ecosystems evolution and inform decision making. Data Spaces are intended to become the EC comprehensive solution to integrate data from different sources with the aim to generate and provide a more ready to use knowledge on climate change, circular economy, pollution, biodiversity, and deforestation.

The AD4GD project does research on the co-creation of the European Green Deal Data Space as an open space for FAIR data and standards-based services tested in 3 pilot cases providing testbeds in terms of data, standards, sharing and interoperability. One of these pilots, is focused on Ecological Terrestrial Connectivity in Catalonia (NE of Spain).

The challenges are: (1) monitoring ecological connectivity in terrestrial ecosystems through the integration of state-of-the-art multi-sensor remote sensing imagery, ecological models, in-situ biodiversity observations and sensors; and (2) forecasting ecological connectivity to help to define effective actions to reduce terrestrial biodiversity loss.

To this goal, solutions are being proposed and tested, to integrate data from different sources using modern standards such as, raster-based land cover maps and connectivity maps structured as data cubes (Open Data Cube and Rasdaman), GBIF species occurrences exposed via OGC STAplus, as well as data from low-cost automatic sensors (camera traps with species identification software). All this data is related together by the use of semantic tagging with references pointing to vocabularies of GEO Essential Variables stored in OGC RAINBOW Definition Server.

AD4GD is a Horizon Europe project co-funded by the European Union, Switzerland and the United Kingdom.

How to cite: Serral, I., Bastin, L., Kriukov, V., and Masó, J.: Data Spaces as an exploratory solution for big data biodiversity challenges in support of the European Green Deal: the case of terrestrial habitat connectivity, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19101, https://doi.org/10.5194/egusphere-egu24-19101, 2024.

EGU24-20031 | Posters on site | ESSI2.11

Optimizing Country-Scale Crop Type Classification: A Comprehensive Study on Preprocessing Techniques and Deep Learning Classifiers within the Copernicus Data Space Ecosystem 

Jędrzej Bojanowski, Jan Musiał, Jan Małaśnicki, Jacek Chojnacki, and Wojciech Bylica

Identification of crops over large areas is necessary for monitoring agricultural production, establishing food security policies, or controlling compliance with the Common Agricultural Policy. Data acquired by Sentinel-1 and Sentinel-2 satellite constellations with a combination of high temporal and spatial resolutions has already demonstrated, separately or fused, its ability to classify crops. However, here we propose a methodology for crop type classification based on Sentinel-2 data focusing on a few specific aspects that have been often not addressed simultaneously in the previous works:

  • Comparison of processing steps when single field representation is generated, i.e. multiple-pixel representation (Pixel-Set Encoder, https://doi.org/10.48550/arXiv.1911.07757) vs zonal statistics
  • Application of various machine and deep learning classifiers, from Random Forest as a background, through Long Short-Term Memory and Gated Recurrent Unit, to novel Transformers
  • Providing reliable full probability distribution of all crops for each classified crop field that allow for flexible user-specific minimalization of omission or commission errors
  • Applicability in operational mode on a large beyond-country-scale i.e. for tens of millions of crop fields
  • Efficient use of EO-data open cloud infrastructures such us the Copernicus Data Space Ecosystem or CREODIAS

The methodology has been already implemented and validated for Austria and Poland using Land Parcel Information System as a reference, but the list of countries will continue to grow. For Poland, 41 crops were successfully classified in 2022, for which the precision ranged from 0. 64 (pepper) to 0.97 (maize, beetroot, winter rape), and the overall classification accuracy (kappa) was 0.88. However, using a threshold of 0.85 on the probabilities allowed kappa of 0.95 and a range of accuracy (precision) for crops from 0.88 to 1.00, with 26 crops classified with an accuracy above 0.95. For Austria, for the years 2018-2021, a kappa coefficient of 0.85 to 0.9 was obtained without applying thresholds on probabilities. For Poland, classification was also carried out during the growing season reaching kappa of: 0.78 (in May), 0.84 (in June), 0.87 (in July) and 0.88 at the end of the season.

With this approach, we also demonstrate how to use the Copernicus Data Space Ecosystem to efficiently extract information from big volumes of satellite data - here from Sentinel-2 and tens of millions of agricultural parcels.

How to cite: Bojanowski, J., Musiał, J., Małaśnicki, J., Chojnacki, J., and Bylica, W.: Optimizing Country-Scale Crop Type Classification: A Comprehensive Study on Preprocessing Techniques and Deep Learning Classifiers within the Copernicus Data Space Ecosystem, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20031, https://doi.org/10.5194/egusphere-egu24-20031, 2024.

The Copernicus Data Space Ecosystem provides open access to the petabyte-scale EO data repository and to a wide range of tools and services, limited to some predefined quoatas. For users who would like to develop commercial services or for those who would like to have larger quotas/unlimited access to services the offer of CREODIAS platform is the solution. In this study an example of such a (pre)commercial service will be presented which publishes the Copernicus Sentinel-1 and Sentinel-2 products in the form of a Web Map Service (WMS) and WCS (Web Coverage Service). The architecture of the services based on the Kubernetes cluster allows horizontal scaling of a service along with a number of users requests. The WMS/WCS services to be presented combine data discovery, access, (pre)-processing, publishing (rendering) and dissemination capabilities available within a single RESTful (Representational state transfer) query. This gives a user great flexibility in terms of on-the-fly data extraction across a specific AOI (Area Of Interest), mosaicing, reprojection, simple band processing (cloud masking, normalized difference vegetation), rendering. The performance of the Copernicus Data Space Ecosystem and CREODIAS platform combined with the efficient software (Postgres 15 with PostGIS extension, MapServer with GDAL backend and Kakadu JPEG2000 driver) allows to achieve WMS/WCS service response time below 1 second on average. This in turn, gives a potential for massive parallelization of the computations given the horizontal scaling of the Kubernetes cluster.  
 
 

 

 

How to cite: Niemyjski, M. and Musiał, J.: Scaling up the Copernicus Data Space Ecosystem services using the CREODIAS cloud computing platform example of the WMS server deployed on the Kubernetes cluster, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20084, https://doi.org/10.5194/egusphere-egu24-20084, 2024.

EGU24-20830 | Orals | ESSI2.11

The European Green Deal Data Space roadmap 

Sébastien Denvil, Marta Gutierrez, Mark Dietrich, Nevena Raczko, Mattia Santoro, Christian Briese, and Charis Chatzikyriakou

The European Commission has a program to accelerate the Digital Transition and is putting forward a vision based on cloud, common European Data Spaces and AI. As the data space paradigm unfolds across Europe, the Green Deal Data Space emerges. Its foundational pillars are to be built by the GREAT project.

GREAT, the Green Deal Data Space Foundation and its Community of Practice, has the ambitious goal of defining how data with the potential to help combat climate and environmental related challenges, in line with the European Green Deal, can be shared more broadly among many stakeholders, sectors and boundaries, according to European values such as fair access, privacy and security.

The Green Deal Data Space stands at the intersection of two major European policy initiatives: the EU Strategy for Data and the European Green Deal. The GDDS will be designed and implemented to exploit the potential of data to effectively support the Green Deal priority actions, empowering policy makers, businesses, researchers, and citizens, from Europe and around the world, to jointly tackle issues such as climate change, circular economy, zero pollution, biodiversity protection, deforestation and compliance assurance.

Out of the many European Green Deal strategic actions, the GREAT project focussed on three priorities (Biodiversity 2030, Zero Pollution and Climate change), in order to effectively capture the diversity of requirements across the full range of the European Green Deal. These three initiatives are interlinked with other EGD strategic action, approximate the full scope of the GDDS, as well as complementing actions being addressed by other thematic data spaces (such as the “Farm to Fork Strategy” being addressed by the agricultural data space). During the final stage of the roadmap elaboration, circular economy aspects together with the digital product passport concept were considered so that in addition to the Green Deal Policies binding targets, circular economy and digital product pilots and use cases should be instrumental in driving implemtation's decisions.

The implementation roadmap will guide the efforts of multiple actors to converge toward a blueprint technical architecture, a data governance scheme that enables innovative business cases, and an inventory of high value datasets. Data sharing by design and data sovereignty are some of the main principles that will apply from the early stages. This talk will present the final version  of the roadmap, pillars, goals and strategic actions and will seek feedback and collaborations from the EGU community.

How to cite: Denvil, S., Gutierrez, M., Dietrich, M., Raczko, N., Santoro, M., Briese, C., and Chatzikyriakou, C.: The European Green Deal Data Space roadmap, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20830, https://doi.org/10.5194/egusphere-egu24-20830, 2024.

EGU24-20886 | Orals | ESSI2.11

The importance of data management principles for efficient data spaces 

Bente Lilja Bye and Marie-Francoise Voidrot

The European Commission has launched the European strategy for data, introducing an exciting vision of data spaces for many domains. This data space concept and its expected benefits have been so attractive that we already observe many ongoing initiatives building data spaces with large thematic overlaps as well as data spaces at different geographic scales including addressing different periods of time.  Interoperability across data spaces in such a context is of increasing importance, and in order to prepare for the necessary interoperability also between the numerous thematic and domain-specific data spaces,  a minimum requirement to begin with is good data management. 

In this presentation, an overview of key data management principles such as the Group on Earth Observation (GEO) data sharing and data management principles, FAIR, CARE, and TRUST principles will be given. A comparison based on the experience from the GEO dialogue series in 2022 and 2023 and how the principles in various ways support the creation of efficient, usable data spaces will be offered. There will be a focus on interoperability throughout the preparation to the creation and management of data spaces and how this is pivotal to avoid data fragmentation and thereby contribute to efficiently implementing  parts of the European data strategy, namely the data spaces concept. 

How to cite: Bye, B. L. and Voidrot, M.-F.: The importance of data management principles for efficient data spaces, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20886, https://doi.org/10.5194/egusphere-egu24-20886, 2024.

Traditionally, processing and analysing satellite imagery has not been a straightforward task, with a lot of specialist knowledge required to make your data analysis ready. In addition, limited computing power made it only feasible to process and analyse individual satellite image acquisitions. 

This was not an issue as data even 10 years ago was not as numerous as it is today. For example, back in 2013, you would likely use Landsat 8 which has a 16 day revisit time at 30m resolution. In 2021 though, we now have access to Sentinel-2 with a revisit time of 5 days at 10m resolution. And if a scientist wished to develop any multitemporal application then they would likely rely on super-computers funded and managed by their institution. 

However, with the advances in computing power and the improvements in satellite revisit times meaning there is more data available now means that processing imagery on your desktop/laptop is no longer an effective approach.

The introduction of the Copernicus Data Space Ecosystem is now making cloud processing methods accessible and open to earth observation scientists across Europe and the world. With its scalability, ease of use, and powerful data processing capabilities, the Copernicus Data Space Ecosystem is designed to be flexible and adaptable to the needs of all types of users, from researchers to intergovernmental institutions. 

In this talk, we will run a comparison exercise generating a multi-temporal satellite imagery derived product running a traditional workflow, before demonstrating an optimized version of the workflow using cloud processing APIs. A use case will be presented; where a simple harvest detection map will be derived using the Normalized Difference Vegetation Index (NDVI) and the Bare Soil Index (BSI). Using the difference in NDVI between the most recent and least recent timestamp, the BSI in the most recent timestamp, and thresholding these values, we classify pixels as either harvested or not harvested.

The same workflow will then be run using the Sentinel Hub APIs that are available to all users of the Copernicus Data Space Ecosystem. The lines of code, amount of data downloaded and the time taken to process the data will then be compared demonstrating the efficiencies that can be gained from moving your satellite imagery processing to the cloud.

 

How to cite: Ray, W.: Why it's time to switch your EO data processing and analysis to the cloud , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21701, https://doi.org/10.5194/egusphere-egu24-21701, 2024.

EGU24-21708 | Orals | ESSI2.11

Due diligence for deforestation-free supply chainswith Sentinel-2 imagery from the Copernicus DataSpace Ecosystem  

Heiko Balzter, Nezha Acil, Ivan Reading, Konstantina Bika, Toby Drakesmith, Chris McNeill, Sarah Cheesbrough, and Justin Byrne

At COP26, the Glasgow Leaders Declaration committed the majority of the world’s nations to
ending deforestation by 2030. On 29 June 2023, the EU Regulation on deforestation-free
products (EU) 2023/1115 entered into force. The main driver of deforestation and forest
degradation is the expansion of agricultural land for the production of commodities like cattle,
wood, cocoa, soy, palm oil, coffee, rubber and derived products. Any trader wishing to sell these
commodities on the EU single market or export from within it, must prove that the products do
not originate from recently deforested land or have contributed to forest degradation.
Satellite imagery provides the means of addressing the implementation of the EU Regulation
on deforestation-free supply chains, and of strengthening forest governance through the
provision of timely information to national forest services. We present the PyEO near-real-time
forest alert system from Sentinel-2, a current operational application to reduce illegal logging in
Kenya, and its potential to support im- and exporters in demonstrating deforestation-free supply
chains developed by the ForestMind project.

The software implementation used the Python for Earth Observation (PyEO) library to
automatically extract information on forest loss from Sentinel-2 satellite imagery. It queries the
Copernicus Data Space Ecosystem for new imagery, downloads the automatically selected
Sentinel-2 images, applies a previously trained random forest machine learning model to detect
forest loss, and generates a multi-layer analyst report.
For the forest law enforcement in Kenya, the latest forest alerts are sifted and prioritised by
the Kenya Forest Service’s Forest Information Centre in Nairobi, and delegated to forest rangers
in the field for investigation. Forest rangers navigate to the field site inside the forest reserve,
accompanied by a local community scout, and report back to head office with their observations
and whether any arrests for illegal logging were made. Since its introduction in Kwale County in
2019, over 2000 forest alerts have been investigated. The dominant cause of the deforestation is
illegal logging, followed by charcoal production.
For the due diligence application, a Forest Analyst can then use the analyst-report and
additional software tools to create company reports suitable for communication to im- and
exporters for monitoring the impact of their supply chains on deforestation and forest
degradation.

 

How to cite: Balzter, H., Acil, N., Reading, I., Bika, K., Drakesmith, T., McNeill, C., Cheesbrough, S., and Byrne, J.: Due diligence for deforestation-free supply chainswith Sentinel-2 imagery from the Copernicus DataSpace Ecosystem , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21708, https://doi.org/10.5194/egusphere-egu24-21708, 2024.

EGU24-21722 | Posters on site | ESSI2.11

EarthCODE  

Garin Smith, Eddie Glover, Alasdair Kyle, Stefan Meißl, Annekatrien Debien, and Anca Anghelea

The EarthCODE vision will provide a cloud-based, user-centric development environment which can be used to support ESA science activities and projects. Our solution is built around existing open-source solutions and building blocks, primarily the Open Science Catalogue, EOxHub and the Exploitation Platform Common Architecture (EOEPCA+). EarthCODE capability will be hosted on the Copernicus Data Space Ecosystem (CDSE). Because EarthCODE will adopt a federated approach, it will also facilitate processing on other platforms, and these may include Deep Earth System Data Lab (DeepESDL), ESA EURO Data Cube (EDC), Open EO Cloud / Open EO Platform and perhaps AIOPEN / AI4DTE.

1. GOALS

EarthCODE will support FAIR Open Science. The main use cases will allow users to “Access Compute Resource” in the
cloud and “Conduct Science” using their preferred approach to create reproducible workflows. “Manage Workflows” will
allow users to execute workflows on a cloud platform of their choice and enable federated workflow and federated data
solutions. “Publish Results” will allow scientists to share experiments and data within the community and “Interact with Community” will enable a collaborative approach throughout this lifecycle. EarthCODE will allow scientists to concentrate
on science and collaboration and hide the complexity of Managing and implementing Workflows when required.

2. OPEN SCIENCE PLATFORMS
EarthCODE will provide users with a portal to the underlying platform services. Broadly speaking, this will include FAIR Open Science services that support the delivery of scientific workflows, Infrastructure Services that support the Execution of
workflows and Community Services to support underlying collaboration and exploitation.

3. BUILDING BLOCKS

EarthCODE will be created use existing building blocks. These will include EOEPCA+, EOxHub, Open Science Catalogue and other existing platform services.

4. PLATFORM FREEDOM

EarthCODE will be designed using open standards to help facilitate platform freedom and platform interoperability. This means that more than one type of Development Tooling can be used to conduct science and more than one of Integrated
Platform service can be used to Manage Workflows.

5. PLATFORM INTEGRATION
Using the above approach, EarthCODE will facilitate the use of data and services from a wide variety of platforms as illustrated
below. This shows how EarthCODE will promote integration with different platforms yet still provide freedom of platform
implementation. STAC will be used at the heart of the Science Catalogue to describe resources.

Platform integration will also be supported by a DevOps best practice approach to facilitate integration.

6. WORKFLOW AND DATA INTEGRATION

EarthCODE will prioritise the management of Workflow and Data modelling to ensure successful platform integration and
successful collaboration. This approach will further support the integration of platforms and data. FAIR Data principles will be
applied.

ACKNOWLEDGEMENTS

This work is funded by the European Space Agency.

 

How to cite: Smith, G., Glover, E., Kyle, A., Meißl, S., Debien, A., and Anghelea, A.: EarthCODE , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21722, https://doi.org/10.5194/egusphere-egu24-21722, 2024.

EGU24-21733 | Orals | ESSI2.11

Cubes & Clouds – A Massive Open Online Course for Cloud Native Open Data Sciences in Earth Observation  

Peter James Zellner, Rufai Omowunmi Balogun, Robert Eckhardt, Henryk Hodam, Tyna Dolezalova, Stefan Meißl, Jonas Eberle, Michele Claus, Mattia Callegari, Alexander Jacob, and Anca Anghelea

Earth Observation scientists are confronted with unprecedent data volumes that are constantly growing as the number of satellite missions increases, as well as spatial and temporal resolution do. Traditional working modes, such as downloading satellite data and processing it on a local computer, do not apply anymore and EO science is moving quickly towards cloud technologies and open science practices. Even though these new technologies and practices are evolving quickly and are becoming community standard, there is not much educational material available to educate the next generation of EO scientists.

The Massive Open Online Course Cubes & Clouds - Cloud Native Open Data Sciences for Earth Observation teaches the concepts of data cubes, cloud platforms and open science in the context of earth observation. It targets Earth Science students and researchers who want to increase their technical capabilities onto the newest standards in EO computing, as well as Data Scientists who want to dive into the world of EO and apply their technical background to a new field. Before starting,  prerequisites are a general knowledge of EO and python programming. The course explains the concepts of data cubes, EO cloud platforms and open science by applying them to a typical EO workflow from data discovery, data processing up to sharing the results in an open and FAIR (Findable, Accessible, Interoperable, Reusable) way. An engaging mixture of videos, animated content, lectures, hands-on exercises and quizzes transmits the content. After finishing the participant will understand the theoretical concepts of cloud native EO processing and have gained practical experience by conducting an end-to-end EO workflow. The participant will be capable of independently using cloud platforms to approach EO related research questions and be confident in how to share research by adhering to the concepts of open science.
The hands on exercises are carried out on the EO cloud platform Copernicus Data Space Ecosystem and are leveraging the Copernicus Data available through the STAC Catalogue and the cloud processing API openEO. In the final exercise the participants collaborate on a community mapping project adhering to open science and FAIR standards. A snow cover map is jointly created where every participants maps a small area of the alps and submits it to a STAC catalogue and web viewer. Ultimately, the map grows in space and time and every participant has contributed proving they are capable of EO cloud computing and open science practices.
The talk will guide through the topics covered in the MOOC and show how they are presented in the EOCollege e-learning platform, the links to the exercises carried out on CDSE will be explored and the open science aspect will be shown in the community mapping project and the invitation to collaborate on the courses completely open GitHub repository.

 

 

How to cite: Zellner, P. J., Balogun, R. O., Eckhardt, R., Hodam, H., Dolezalova, T., Meißl, S., Eberle, J., Claus, M., Callegari, M., Jacob, A., and Anghelea, A.: Cubes & Clouds – A Massive Open Online Course for Cloud Native Open Data Sciences in Earth Observation , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-21733, https://doi.org/10.5194/egusphere-egu24-21733, 2024.

EGU24-22217 | ECS | Posters on site | ESSI2.11

Support of air pollution monitoring using Senitnel-5P data – how to improve air quality measurement network  

Patryk Grzybowski, Krzysztof Markowicz, and Jan Musiał

Nitrogen dioxide (NO2 ) pollution is one of the most dangerous threats for public health. It is responsible for respiratory diseases, cardiovascular diseases, asthma, and many others. The objective of the presented study was to verify the potential of the Sentinel-5 Precursor Tropospheric NO2 Column Number Density (NO2 TVCD), provided by Copernicus Data Space Ecosystem, to support air pollution monitoring in Poland. The implementation of the objective was divided into three stages:

  • to develop a model which estimates a ground-level NO 2 concentration;
  • to find spatial representativeness area (SR area) referring to existing ground stations;
  • to propose localization of new ground stations.

Model estimating a ground-level NO2 concentration was developed in order to get spatial distribution of NO2 pollution over whole Poland. As a input data there was used NO2 TVCD provided by Copernicus Data Space Ecosystem as well as meteorological factors like air temperature, wind speed, planetary boundary layer height, pressure, atmospheric circulation type, wind direction, solar adiation flux from ERA-5 provided by European Centre for Medium-Range Weather Forecasts (ECMWF) and anthropogenic conditions (nightlight intensity, population, roads density). Due to the need for high computing power and a constantly working environment, CREODIAS resources were used. There were used several machine learning approach among which random forest had been found as a most accurate. The results revealed that model demonstrated MAE values of 3.4 μg/m3 (MAPE~37%) and 3.2 μg/m3 (MAPE~31%) for the hourly and weekly estimates, respectively.
Obtaining NO2 ground concentration for whole Poland allowed for investigation of spatial autocorrelation of the air pollution phenomenon and determination of SR area. There were used five methods:

  • Global and Local Moran’s: it was found strong spatial autocorrelation (Global Moran’s=0.99 and p-value <0.05). Also, ~1% of Poland NO 2 pollution does not depend on neighborhood area;
  • Correlation in respect to distance: it was observed that c.a. 10% of Poland’s population exposed to high levels of pollution (higher than yearly World Health Organization - WHO recommendation=10 μg/m3 ) is not covered by a SR area;
  • Semivariance: it was found that c.a. 12% of Poland’s population exposed to high levels of pollution is not covered by a SR area.
  • Similarity threshold: It was found that c.a. 7% of Poland’s population exposed to high levels of pollution is not covered by a SR area.


Due to the findings above it was possible to determinate a number and localizations of the new
stations. Depending on method it was found there was a need to establish from 7 to 22 new stations
in order to cover all population threaten by high NO 2 concentration.

To sum up, implementation of Sentinel-5P data as well as meteorological and anthropogenic data
allowed for finding solution which could be very useful for design or improvement of air quality
measurements network.

 

How to cite: Grzybowski, P., Markowicz, K., and Musiał, J.: Support of air pollution monitoring using Senitnel-5P data – how to improve air quality measurement network , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22217, https://doi.org/10.5194/egusphere-egu24-22217, 2024.

EGU24-298 | ECS | Posters on site | CL5.6

Sensitivity of the CNRM-CM6-1 ocean-climate model to freshwater inputs from Antarctica 

Joachim Piret, Sarah Berthet, and Aurore Voldoire

Meltwater fluxes from Antarctica are in general poorly represented in ocean models in terms of quantity and spatio-temporal variability. These meltwater fluxes impact the stratification and the circulation of the Southern Ocean, which is a key component of the climate system. In particular, the opening of deep water polynyas depends, amongst other, on ocean stratification. In turn, these polynyas then, for instance, impact the ventilation of the Southern Ocean and the ocean-atmosphere exchanges of heat. 

 
In this study, we explore how the ocean and sea-ice components of the CNRM-CM6-1 climate model are affected by the spatial distribution and magnitude of meltwater fluxes through three sensitivity experiments. In a first experiment, only a constant basal melting (restricted at the coast) is used as forcing. In a second experiment, only melting from monthly drifting icebergs is used. Finally, in a third experiment, both melting fluxes are used to force the   model.          

                                                                                                                                                                       

In our experiments, several deep water polynyas are diagnosed in the Weddell Sea and offshore of Pridz Bay. We find that these polynyas are places of deep-water formation impacting water masses properties over the entire column.  In this study we analyze how the magnitude, occurrences and frequencies of occurrences of these polynyas are affected by the representation of the meltwater fluxes from Antarctica. We also diagnosed a deep water polynya around Maud Rise with features similar to the giant polynya observed every winter between 1974 and 1976. 

 

Understanding the opening of these polynyas is challenging, since this requires an analysis on the stability of the water column and disentangling the role of external forcing (i.e. the role of the meltwater fluxes) from the model’s internal variability. 

How to cite: Piret, J., Berthet, S., and Voldoire, A.: Sensitivity of the CNRM-CM6-1 ocean-climate model to freshwater inputs from Antarctica, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-298, https://doi.org/10.5194/egusphere-egu24-298, 2024.

EGU24-1617 | Posters on site | CL5.6

ACCESS-NRI: Supporting Climate Science through Robust Model Evaluation and Community-Driven Strategies 

Romain Beucher, Michael G. Tetley, Yousong Zeng, Felicity Chun, Dougal Squire, Owen Kaluza, Kelsey Druken, and Andy Hogg

The Australian Earth System Simulator (ACCESS-NRI) is a national research infrastructure designed to support the development and research of the Australian Community Climate and Earth System Simulator (ACCESS). With a strategic goal to enhance the quality and performance of the ACCESS suite of model configurations, ACCESS-NRI supports the open development and release of a Model Evaluation and Diagnostics (MED) framework for the Australian Earth system modeling community. 

In climate science, the evaluation of computational models plays a pivotal role in assessing their performance and reliability in simulating the Earth's complex climate system. This process involves a comprehensive analysis of model predictions against observed data to determine accuracy and utility. Through meticulous model evaluation, scientists gain insights into real-world climate processes, identify model strengths and weaknesses, and refine models to enhance predictive capabilities. 

Integral to international climate change assessments, climate models are crucial for shaping policies and guiding investments in climate action. Accurate simulations, reliant on precise modeling of climate physics and realistic forcing conditions, such as CO2 emissions, are fundamental to these endeavors. However, the development of these models is a protracted, iterative process requiring constant evaluation of performance and accuracy. Users seek to compare model simulations derived from diverse configurations, necessitating dedicated tools, workflows, and access to diverse data sources within high-performance computing environments. 

The development of tools and methods is integral to this initiative, providing essential support for the incorporation of model evaluation within code development cycles. The presentation explores community-driven strategies designed to streamline the evaluation of the ACCESS suite of models, addressing specific requirements and constraints associated with the World Climate Research Programme’s (WCRP) Coupled Model Intercomparison Project phase 7 (CMIP7) and its future iterations. Emphasis is placed on the incorporation of model evaluation within code development cycles. 

 

How to cite: Beucher, R., Tetley, M. G., Zeng, Y., Chun, F., Squire, D., Kaluza, O., Druken, K., and Hogg, A.: ACCESS-NRI: Supporting Climate Science through Robust Model Evaluation and Community-Driven Strategies, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1617, https://doi.org/10.5194/egusphere-egu24-1617, 2024.

EGU24-1740 | ECS | Posters on site | CL5.6

Introduction of tropospheric chemistry in the CNRM Earth System Model (ESM): towards methane emission-driven capability. 

Martin Cussac, Martine Michou, Pierre Nabat, Béatrice Josse, and Pelletier Sophie

Methane (CH4) is the second most important greenhouse gas after carbon dioxide (CO2), and improving the representation of its cycle in climate models is a key step to reduce uncertainties in climate projections. Its main sink is chemical removal through oxidation with hydroxyl radicals (OH) in the troposphere, which are produced during the photolysis of ozone (O3) in presence of water vapour. This process is an example of the complex interactions between methane and climate system, highlighting the necessity to have an explicit and interactive representation of atmospheric composition in Earth System Models. Here we present the introduction of two tropospheric/stratospheric chemical schemes of various complexities in ARPEGE-Climat 7.0, the future version of the atmospheric component of CNRM-ESM, and the impact on methane representation. This work includes the addition and changes of multiple processes at stakes in the troposphere, for instance emissions, deposition or production of NOx by lightning strikes. We first present an evaluation of tropospheric air composition in our model including all the aforementioned developments. Diagnostics from both chemical schemes in AMIP-type simulations are compared to observations and to state of the art atmospheric composition reanalyses such as the CAMS reanalysis. We highlight, in particular, the performance of both chemical schemes in terms of biases and seasonal cycles of major tropospheric tracers like O3, CO or NO2. We also compute from RFMIP-type simulations the ozone ERF, and compare it to previous estimates. Secondly, we present an evaluation of the behaviour of pre-industrial simulations in a methane “emission-driven” mode. These simulations are compared to more classical “concentration-driven” simulations in terms of global methane budget and methane chemical lifetime.

How to cite: Cussac, M., Michou, M., Nabat, P., Josse, B., and Sophie, P.: Introduction of tropospheric chemistry in the CNRM Earth System Model (ESM): towards methane emission-driven capability., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1740, https://doi.org/10.5194/egusphere-egu24-1740, 2024.

EGU24-1912 | ECS | Posters on site | CL5.6

Evaluation of the EC-EARTH3-VEG climate model in reproducing the evolution of maximum and minimum air temperatures in Brazil 

Fernanda De Amorim Formigoni, Catarina De Abreu Heil, Juliana Da Costa Mendes Reis, Priscila Esposte Coutinho, Lívia Sancho, and Marcio Cataldi

According to the Sixth Report of the Intergovernmental Panel on Climate Change (IPCC), published in 2023, between 2011 and 2020, global surface temperatures increased by 1.1°C compared to the period 1850-1900. This upward trend in temperature is related to changes in the observed climate patterns, which will potentially lead to a greater incidence of extreme weather events and ecosystem changes, as well as impacting the health of human populations. In Brazil, this scenario could also result in biodiversity losses, reduced agricultural productivity and changes in the availability of water resources, with consequences for the country's economy and energy security. Considering this context, this study aims to evaluate the ability of the EC-Earth3-Veg model, part of the Coupled Model Intercomparison Project Phase 6 (CMIP6), to reproduce the evolution of maximum and minimum air temperatures at 2 meters in Brazil during the historical period, at intervals between 1961 and 2014, comparing it with ERA5 reanalysis data. The CMIP6 data was interpolated to the ERA5 grid to carry out the desired analysis. Based on this, it was observed that EC-Earth3-Veg was able to reproduce the historical climatology for Brazil but showed climatological differences when compared to ERA5 in the four periods observed. It is common among the periods analyzed that the further north of the country is warmer, with maximum temperatures in summer and autumn. In winter and spring, the same happens in the eastern part of the North, the Center-West, and the northern part of the Southeast of Brazil. In the fall, the Northeast and Midwest show cooler highs. The period in which the model's results were closest to ERA5 was from 1961 to 1990, especially for minimum temperatures in summer and winter. Even so, in the fall and spring of this period, the model showed warming in relation to the minimums in the South, and, in all the quarterly cut-outs, it showed cooling in the minimums near the Northeast. In general, certain regional and seasonal patterns were observed in the results, which may indicate a limitation of the model in terms of horizontal resolution in considering a more characteristic atmosphere for Brazil. In the southern region, for example, the maximum and minimum temperatures in the model showed warming. This may indicate that the model is limited in its horizontal resolution and does not consider an atmosphere that is more characteristic of Brazil. Therefore, to improve the model's performance in simulating the climate in Brazil, it is necessary to correct the bias and use EC-Earth3-Veg in conjunction with other models to reduce systematic errors. This study aims to replicate this work for other models.

How to cite: De Amorim Formigoni, F., De Abreu Heil, C., Da Costa Mendes Reis, J., Esposte Coutinho, P., Sancho, L., and Cataldi, M.: Evaluation of the EC-EARTH3-VEG climate model in reproducing the evolution of maximum and minimum air temperatures in Brazil, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1912, https://doi.org/10.5194/egusphere-egu24-1912, 2024.

EGU24-2831 | Orals | CL5.6

Efficient spin-up of Earth System Models using sequence acceleration 

Samar Khatiwala and Eleanor Burke

The ocean and land carbon cycles plays a critical role in the climate system and are key components of the Earth System Models (ESMs) used to project future changes in the environment. However, their slow adjustment time also hinders effective use of ESMs because of the enormous computational resources required to integrate them to a pre-industrial quasi-equilibrium, a prerequisite for performing any simulations with these models and, in particular, identifying the human impact on climate. Here, a novel solution to this ``spin-up'' problem, regarded as one of the grand challenges in climate science, is shown to accelerate the equilibration of state-of-the-art marine biogeochemical models typical of those embedded in ESMs by over an order of magnitude. Based on a ``sequence acceleration'' method originally developed in the context of electronic structure problems, the new technique can be applied in a ``black box'' fashion to any existing model. Even under the challenging protocols used to spin-up ESMs for the IPCC Coupled Model Intercomparison Project, which can take up to two years on even some of the most powerful supercomputers, the new algorithm can reduce simulation times by a factor of 5, with preliminary results suggesting that complex land surface models can be similarly accelerated. The ability to efficiently spin-up ESMs would enable for the first time a quantification of major parametric uncertainties in these models, lead to more accurate estimates of metrics such as climate sensitivity, and allow increased model resolution beyond what is currently feasible.

How to cite: Khatiwala, S. and Burke, E.: Efficient spin-up of Earth System Models using sequence acceleration, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2831, https://doi.org/10.5194/egusphere-egu24-2831, 2024.

EGU24-3432 | ECS | Orals | CL5.6

Evaluation of precipitation and ITCZ characteristics in CMIP6 models 

Andreas Karpasitis, George Zittis, and Panos Hadjinicolaou

The Intertropical Convergence Zone (ITCZ) is a band of low pressure near the equator, where the trade winds converge. It is usually accompanied by cloudiness and heavy precipitation, and it migrates northward and southward, following the sun in different seasons. Climate models often misrepresent key atmospheric processes, including ITCZ's position, width and strength. As a result, biases in the modeled precipitation are also common in tropical and sub-tropical regions, such as the Indian subcontinent and parts of South America. Here, we assess the skill of four state-of-the-art Earth System Models in representing key ITCZ characteristics and the associated precipitation. The four ESMs under investigation are EC-EARTH, CNRM-ESM, IPSL-ESM, and UKESM. Besides the CMIP6 version of the aforementioned models, we also aim to evaluate post-CMIP6 simulations, which are currently under development in the framework of the OptimESM Horizon Europe project (https://optimesm-he.eu/). These post-CMIP6 models include advancements in the representation of physical, biogeochemical and biophysical processes. As a reference dataset, we use the ERA5 reanalysis data. Firstly, we divide the world into eight longitudinal zones and then calculate the zonal averages. For each season, we define the ITCZ location as the latitude where there is a peak in the 500hPa vertical velocity, while we consider the edges of the ITCZ at the latitudes where the 500hPa vertical velocity becomes zero. The strength of the ITCZ is defined as the value of the rainfall peak associated with the peak in the vertical velocity field. The analysis is performed on an annual basis, for each year from 1981 through 2010, and the corresponding peaks are clustered. The long-term characteristics of the ITCZ from the ESMs are compared to those from the ERA5 to understand the processes that drive precipitation biases in the global tropics.

How to cite: Karpasitis, A., Zittis, G., and Hadjinicolaou, P.: Evaluation of precipitation and ITCZ characteristics in CMIP6 models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3432, https://doi.org/10.5194/egusphere-egu24-3432, 2024.

EGU24-3938 | Posters on site | CL5.6

Live monitoring, diagnostics and data management for the ACCESS suite of Earth-system simulations. 

Michael Tetley, Dougal Squire, and Romain Beucher

As global Earth-System simulations rapidly increase in complexity as a direct result of available compute resources and advancing scientific model development, monitoring long-running models and managing associated big data outputs has become increasingly difficult. This has inspired a need for open community-focussed software tools that provide intuitive and efficient HPC-native functionality to accurately evaluate live model performance and behaviour, perform diagnostic scientific analyses, compare current results with alternative or reference models, and generate standardised data outputs. Addressing these challenges, we present a new community-driven open-source project Model Live Diagnostics (MLD) which integrates the ACCESS-NRI Intake Catalog API, together forming part of The Australian Earth-System Simulator (ACCESS-NRI) Model Evaluation and Diagnostics (MED) framework supporting climate science research within the Australian Community Climate and Earth System Simulator (ACCESS).

 

MLD is an open-source Python package optimised for use in JupyterLab sessions on the Australian NCI supercomputer Gadi. The primary purpose of MLD is to provide a simple, easy to use and accessible framework for the ACCESS modelling community to check, monitor, visualise and evaluate live running Earth-System model behaviour and progress. Utilising the ACCESS-NRI Intake Catalog API, MLD monitors a given model output data directory on Gadi, dynamically building an intake cataog of the most up-to-date data and automatically generating interactive plots to visualise model variables. From these data, MLD provides functions to perform light-weight diagnostic calculations, compare and plot results against reference models, and export data in standard Xarray format for integration into user workflows. MLD can be installed manually via Conda or PyPI, it comes pre-installed in existing Conda environments on Gadi, and the full source code is available on the project Github repository. MLD currently supports Earth System Model 1.5 (ACCESS-ESM1.5), Model for Ocean/Sea Ice (ACCESS-OM2) and Coupled Model 2 (ACCESS-CM2).

How to cite: Tetley, M., Squire, D., and Beucher, R.: Live monitoring, diagnostics and data management for the ACCESS suite of Earth-system simulations., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3938, https://doi.org/10.5194/egusphere-egu24-3938, 2024.

EGU24-4037 | Orals | CL5.6

Revisiting the performance of CMIP6 models in the Arctic: Concerns on benchmarking climate models 

Shuting Yang, Tian Tian, Jacob L. Hoyer, Pia Nielsen-Englyst, and Suman Singha

Climate models are known to have difficulty in simulating the present day climate in the Arctic. Many studies, including the most recent Inter-governmental Panel on Climate Change Sixth Assessment Report (IPCC AR6), report that, comparing to reanalysis datasets such as ERA5 as reference, climate models participating the past phases of the Coupled Model Intercomparison Project (CMIP) simulate a too cold Arctic. However, recent studies reveal substantial warm biases over sea ice surface in global atmospheric reanalyses due to missing representation of physical processes such as the snow layer on top of the sea-ice.

In this work we revisit the so-called long-standing climate model bias in the Arctic by using a new, satellite-derived near surface air temperature (T2m) dataset for the Arctic sea ice region as an alternative reference to the commonly used reanalysis data ERA5. This observational T2m dataset is derived from the satellite based on DMI/CMEMS daily gap-free (so called L4) sea surface temperature and sea ice surface temperature climate data record, spanning from 1st January 1982 to 31st May 2021, covering the Arctic region (> 58 ◦N). We show that, in comparison with the new observational dataset, the ERA5 reanalysis exhibits widespread warm biases exceeding 2℃ over sea ice in the central Arctic, particularly during winter when the warm bias may be as large as 6-10℃. In contrast, the CMIP6 model ensemble demonstrates reasonable performance, with an annual mean bias less than ±1℃ in the same region. We also find that the CMIP6 model mean slightly outperforms the ERA5 in capturing the observed warming trend over the central Arctic region where is fully covered by sea ice with concentration of more than 70%. Outside of this region, it is evident that ERA5 aligns well with observations, while CMIP6 models show large cold bias in the North Atlantic marginal ice zone, consistent with the well-documented results in the past.

Our results challenge the current assessment of climate models in the central Arctic, suggesting that relying on existing reanalyses datasets as a reference may underestimate the climate models creditability in the region. It is thus imperative to integrate new observational data for benchmarking climate models in the Arctic region.

How to cite: Yang, S., Tian, T., Hoyer, J. L., Nielsen-Englyst, P., and Singha, S.: Revisiting the performance of CMIP6 models in the Arctic: Concerns on benchmarking climate models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4037, https://doi.org/10.5194/egusphere-egu24-4037, 2024.

EGU24-6106 | ECS | Orals | CL5.6

On the implementation of Leith viscosities in NEMO: Results from a forced global ocean model 

Thomas Wilder and Till Kuhlbrodt

In CMIP6, eddy-permitting models (notably HadGEM3-GC3.1 N216ORCA025) were found to perform poorly
against their coarse-resolution counterparts, particularly in the Southern Ocean. Compared with N96ORCA1
(1 degree), ORCA025 exhibited an enhanced warm bias, weakened Antarctic Circumpolar Transport (~ 60 Sv),
overactive Antarctic gyres, and lower Antarctic sea-ice extent and volume. The poor performance of the eddy-
permitting model has been attributed to their difficulty in representing mesoscale processes at higher latitudes.
Despite these shortcomings, eddy-permitting models remain desirable for their capacity to resolve meridional
transports of heat and carbon, and ice-ocean interactions.

The objective of this work is to improve the representation of mesoscale processes in ORCA025 through the
implementation of two viscosity schemes: 2D Leith and Quasi-Geostrophic Leith. These viscosity schemes
have been shown to improve interior mixing by mesoscale eddies in eddy-rich models. Both schemes offer a
parameterised viscosity coefficient that is flow and scale-aware, in contrast to a typical constant biharmonic
viscosity employed in N216ORCA025.

In a forced ocean configuration (GOSI9p8.0 ORCA025), both Leith schemes demonstrate a marked impact on
the ocean’s circulation and its properties compared to biharmonic viscosity. Here, the Leith schemes dampen
the eddy kinetic energy field and reduce the Antarctic circumpolar transport, with corresponding changes in
temperature and salinity fields. Additional simulations, both forced and coupled, are ongoing and may provide
further insights into the different impacts of the Leith viscosity schemes on physical processes in eddy-permitting
Earth system models.

How to cite: Wilder, T. and Kuhlbrodt, T.: On the implementation of Leith viscosities in NEMO: Results from a forced global ocean model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6106, https://doi.org/10.5194/egusphere-egu24-6106, 2024.

EGU24-7132 | Orals | CL5.6

Introduction to an open-source tool for collective Earth System Model evaluation and benchmarking: PCMDI Metrics Package (PMP) 

Jiwoo Lee, Ana Ordonez, Peter Gleckler, Paul Ullrich, Yann Planton, Bo Dong, Kristin Chang, Paul Durack, Elina Valkonen, and Julie Caron

The PCMDI Metrics Package (PMP) is an open-source Python-based framework that enables objective "quick-look" comparisons and benchmarking of Earth System Models (ESMs) against observation-based reference datasets. The PMP, which is focused primarily on atmospheric quantities, has been used for routine and systematic evaluation of thousands of simulations from the Coupled Model Intercomparison Project (CMIP). Ongoing work aims for seamless application of the tool to the next generation of CMIP (CMIP7), with an aspiration to aid modeling groups during their development cycle. The latest version of PMP offers a diverse suite of evaluation capabilities covering large- to global-scale climatology and annual cycle, variability modes such as tropical and extratropical variability modes e.g., ENSO and MJO, regional monsoons, cloud feedback, and high frequency characteristics e.g., extremes. Current work is expanding the scope of PMP to include the evaluation of the following: (1) Quasi-Biennial Oscillation (QBO) and its teleconnection to MJO, (2) atmospheric blocking and rivers leveraging Machine Learning based pattern detection algorithms, and (3) polar and high-latitude regions by implementing sectional sea-ice area metrics. The PMP is also advancing its evaluation capabilities to help evaluate higher resolution simulations such as those from the HighResMIP, cloud-resolving E3SM experiments, and regionally downscaled products. This presentation will highlight the motivation for routine model evaluation, introduce the PMP, share progress on current polar metrics, and discuss future plans and opportunities to connect with ongoing efforts.

How to cite: Lee, J., Ordonez, A., Gleckler, P., Ullrich, P., Planton, Y., Dong, B., Chang, K., Durack, P., Valkonen, E., and Caron, J.: Introduction to an open-source tool for collective Earth System Model evaluation and benchmarking: PCMDI Metrics Package (PMP), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7132, https://doi.org/10.5194/egusphere-egu24-7132, 2024.

The air-sea CO2 flux FCO2 is an important component of the global carbon budget and understanding its response to climate change is crucial to adjust mitigation pathways. Multi-linear regression supports the expectation that the balance between the CO2 partial pressures of air and the sea surface (pCO2) is the most important driver of temporal FCO2 variability. Discrepancies between state-of-the-art Earth System Models (ESMs) and gridded pCO2-products suggest that systematic biases exist across an ensemble of ESMs. In the equatorial regions, upwelling variability of carbon-rich water is biased in ESMs as modeled and observed sea surface temperature are generally uncorrelated. In the high latitudes, the climate change induced trend towards lighter sea water is overestimated in ESMs, which yields - in contrast to observations - shallower mixed layers over the contemporary period and hence a suppressed carbon supply from depth. While mixed layer depth variability and trends appear biased throughout the global ocean, this is not a determining factor for pCO2 variability in subtropical gyres. The results highlight the importance of accurately modeling hydrographic properties to obtain robust estimates of FCO2 and its variability.

How to cite: Danek, C. and Hauck, J.: Discrepancies in temporal pCO2 variability from Earth System Models and pCO2-products related to high-latitude mixed layer dynamics and equatorial upwelling, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8181, https://doi.org/10.5194/egusphere-egu24-8181, 2024.

EGU24-8947 | ECS | Posters on site | CL5.6

Monitoring and Benchmarking of Earth System Model Simulations with ESMValTool 

Manuel Schlund, Axel Lauer, Lisa Bock, and Birgit Hassler

Earth system models (ESMs) are important tools to improve our understanding of present-day climate and to project climate change under different plausible future scenarios. For this, ESMs are continuously improved and extended resulting in more complex models. Particularly during the model development phase, it is important to continuously monitor how well the historical climate is reproduced and to systematically analyze, evaluate, understand, and document possible shortcomings. For this, putting model biases relative to observations into the context of deviations shown by other state-of-the-art models greatly helps to assess which biases need to be addressed with higher priority. Here, we introduce the new capability of the Earth System Model Evaluation Tool (ESMValTool) to monitor running or benchmark existing simulations with observations in the context of results from the Coupled Model Intercomparison Project (CMIP). ESMValTool is an open-source community-developed diagnostics and performance metrics tool for the evaluation and analysis of ESMs. To benchmark model output, ESMValTool calculates metrics such as root mean square error (RMSE) or the coefficient of determination (R2) relative to reference datasets. This is directly compared to the same metric calculated for an ensemble of models such as CMIP6, which provides a statistical measure for the range of values that can be considered typical for state-of-the-art models. Results are displayed in different types of plots such as map plots (using stippling and hatching) or time series (via uncertainty bands). Automatic downloading of CMIP results from the Earth System Grid Federation (ESGF) makes application of ESMValTool for benchmarking of individual model simulations, for example in preparation of CMIP7, easy and very user friendly.

How to cite: Schlund, M., Lauer, A., Bock, L., and Hassler, B.: Monitoring and Benchmarking of Earth System Model Simulations with ESMValTool, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8947, https://doi.org/10.5194/egusphere-egu24-8947, 2024.

EGU24-9963 | Orals | CL5.6 | Highlight

Exploring the Role of Interactive Methane in Earth System Models used for Climate Projections 

Fiona O'Connor, Gerd Folberth, Nicola Gedney, and Chris Jones

Methane plays a crucial role in the Earth System as a greenhouse gas and a tropospheric ozone precursor. However, in Phase 6 of the Coupled Model Intercomparison Project (CMIP6), Earth System Models predominantly relied on prescribed surface methane concentrations derived from historical observations or predefined future pathways. This study uses novel Earth System Model capability to investigate the impact of an emissions-driven methane cycle, including interactive wetland emissions. Specifically, we explore the influence of interactive methane on the effective radiative forcing of carbon dioxide and the model’s transient and equilibrium climate responses to changes in carbon dioxide.

The response of the climate to external forcings is intricately linked to climate feedbacks. With the inclusion of an interactive methane cycle in Earth System Models, understanding how changes in carbon dioxide and climate affect the methane cycle becomes imperative. This work critically re-evaluates the CMIP6 assessment of methane feedbacks and, for the first time, disentangles both the biophysical and radiative effects of carbon dioxide on wetland emissions and methane lifetime. 

By enabling the interaction of the biophysical and radiative effects of carbon dioxide with natural methane emissions, concentrations, and climate responses, this presentation highlights the necessity of incorporating interactive methane components in Earth System Models. Notably, this approach provides scientists with the means to assess the direct implications of methane emission reduction policies and climate feedbacks on meeting global climate and air quality targets.

How to cite: O'Connor, F., Folberth, G., Gedney, N., and Jones, C.: Exploring the Role of Interactive Methane in Earth System Models used for Climate Projections, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9963, https://doi.org/10.5194/egusphere-egu24-9963, 2024.

EGU24-10399 | Posters on site | CL5.6

Extreme cold events in Europe under a reduced AMOC 

Susanna Corti, Virna Loana Meccia, Claudia Simolo, and Katinka Bellomo

There is a consensus that a weakened Atlantic Meridional Overturning Circulation (AMOC) decreases mean surface temperature in the Northern Hemisphere, both over the ocean and the continents. However, the impacts of a reduced AMOC on cold extreme events have not yet been examined. We analyse the impacts of a reduced AMOC strength on extreme cold events over Europe using targeted sensitivity experiments with the EC-Earth3 climate model. Starting from a fully coupled ocean-atmosphere simulation in which the AMOC was artificially reduced, a set of atmosphere-only integrations with prescribed sea surface temperature and sea-ice cover was conducted to evaluate the effects of weakly and strongly reduced AMOC strength. Despite overall cooling, reduced AMOC leads to fewer winter cold spells in Europe. We find that the weakened AMOC intensifies near-surface meridional gradient temperature in the North Atlantic and Europe, thus providing the energy to boost the jet stream. A stronger jet stream leads to less atmospheric blocking, reducing the frequency of cold spells over Europe. Although limited to the output of one model, our results indicate that a reduced AMOC strength may play a role in shaping future climate change cold spells by modulating the strength of the jet stream and the frequency of atmospheric blocking.

How to cite: Corti, S., Meccia, V. L., Simolo, C., and Bellomo, K.: Extreme cold events in Europe under a reduced AMOC, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10399, https://doi.org/10.5194/egusphere-egu24-10399, 2024.

EGU24-10541 | ECS | Posters on site | CL5.6

Computationally efficient evaluation of Earth System Models in the presence of complex uncertainties 

Nils Weitzel, Muriel Racky, Laura Braschoß, and Kira Rehfeld

Comparing Earth System Model (ESM) simulations with in-situ, lab, and remote sensing measurements often involves analytically intractable uncertainty structures for example due to observational uncertainties, internal variability in the climate system, and limitations of ESMs. With increasing resolution of models, ensemble sizes, length of simulations, and number of observations, this can create computational bottlenecks. Making use of Monte Carlo techniques in a data cube architecture, we present a python package for efficient propagation of complex uncertainties in model-data comparison. Additionally, the package contains functionalities for visualizations of uncertainties and the computation of probabilistic score functions.

Our focus is on measurement operators, in particular so-called proxy system models that map ESM output onto proxy measurements for transient paleoclimate simulations. Proxy system models contain multiple sources of autocorrelated and non-Gaussian uncertainties due to complex proxy-climate relationships, chronological uncertainties, and processes perturbing the recorded climate signal during the sedimentation of the proxy. Thereby, we connect data cube methods for processing climate simulations with analysis techniques for point data such as those stemming from time series of paleoclimate proxy records. We demonstrate our approach by quantifying the discrepancies of temperature and forest cover changes between global proxy networks and transient simulations from the Last Glacial Maximum to present-day. Given the ongoing shift in the paleoclimate modelling community from equilibrium time-slice towards long transient simulations, our work can help integrate the evaluation of simulations from the Paleoclimate Modelling Intercomparison Project (PMIP) into CMIP7 model benchmarking. Additionally, the implemented methods are transferable to other types of observations that are subject to analytically intractable uncertainty structures.

How to cite: Weitzel, N., Racky, M., Braschoß, L., and Rehfeld, K.: Computationally efficient evaluation of Earth System Models in the presence of complex uncertainties, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10541, https://doi.org/10.5194/egusphere-egu24-10541, 2024.

EGU24-11464 | Posters on site | CL5.6 | Highlight

Are early warning signals present for climate tipping points detected in CMIP6?   

Tim Lenton, Paul Ritchie, and Chris Boulton

Many elements of the climate system are believed to be at risk of tipping in the near future due to ongoing climate change. Abrupt shifts or tipping points have found to be prevalent in several of the latest generation of climate models (CMIP6) under a range of future emission scenarios. However, by observing the time series alone it is notoriously difficult to predict an upcoming tipping point. Therefore, so-called early warning indicators are needed to try to forewarn of an approaching tipping point. Two commonly used early warning signals, designed to detect critical slowing down prior to the tipping point, are to observe an increase in autocorrelation and variance. In this presentation, we assess the reliability and performance of these indicators for a range of tipping points, scenarios and models. In examples of the indicators performing poorly, we consider the potential for system specific indicators.  

How to cite: Lenton, T., Ritchie, P., and Boulton, C.: Are early warning signals present for climate tipping points detected in CMIP6?  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11464, https://doi.org/10.5194/egusphere-egu24-11464, 2024.

EGU24-12558 | Posters virtual | CL5.6

Recent developments in the Earth System Model evaluation tool 

Bouwe Andela, Birgit Hassler, and Manuel Schlund

ESMValTool (Earth System Model eValuation Tool) is open-source, community-developed software for the evaluation of Earth system models, mainly in the context of multi-model analyses, e.g. the Coupled Model Intercomparison Project (CMIP). ESMValTool provides a large number of “recipes” that reproduce published figures, for example, some of the figures found in reports of the Intergovernmental Panel on Climate Change (IPCC). ESMValCore, the framework powering ESMValTool, provides capabilities that make it easy to work with data produced for CMIP as well as related observational and reanalysis data, e.g. discovering, downloading, and preprocessing these data. Here, we present new features that have been added to ESMValCore and ESMValTool in the past year.

Improved computational performance: it is now possible to use Dask Distributed to run the tool and almost all preprocessor functions are now using Dask arrays, resulting in lower memory use and faster computations. This enables the analysis of higher-resolution datasets, such as those expected for the next round of CMIP. Further performance improvements are planned this year as part of the ESiWACE3 service project.

New recipes and better-looking results: several new analyses have been added, including recipes for reproducing figures from the IPCC’s Fifth and Sixth Assessment Reports and generic recipes for monitoring and evaluating Earth System Model simulations. The webpage displaying the results of a recipe run now looks better and allows for interactive filtering.

More datasets: more observational and reanalysis datasets can now be converted to the CMIP data request standard using the tool. Grids used in the Coordinated Regional Climate Downscaling Experiment (CORDEX) are now better supported.

Communication: the ESMValTool tutorial at https://tutorial.esmvaltool.org has been updated, more Jupyter notebooks are available to demonstrate the use of ESMValCore, and there is a new, open-source website at https://esmvaltool.org.

How to cite: Andela, B., Hassler, B., and Schlund, M.: Recent developments in the Earth System Model evaluation tool, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12558, https://doi.org/10.5194/egusphere-egu24-12558, 2024.

EGU24-12905 | Orals | CL5.6

A new ESMValTool metric using point source observations from AeroNET 

Catherine Hardacre, Klaus Zimmermann, Joakim Löw, and Jane Mulcahy

ESMValTool is an open-source community-developed diagnostics and performance metrics tool for the evaluation and analysis of Earth System Models (ESMs). Key to model evaluation with ESMValTool is the use of observational data, which must comply with Climate Model Output Rewriter (CMOR) standards. ESMValTool provides methods to generate CMOR compliant datasets, but these are designed to process gridded observational data sets, such as those from satellites, and currently it is more difficult to develop point source datasets. Here we present a new ESMValTool metric for evaluating model output against an observation-based climatology of aerosol optical depth (AOD) from the Aerosol Robotic Network (AeroNET). This metric includes a downloader and formatter to generate CMOR compliant datasets for the observational AOD timeseries from all AeroNET stations. These are collated into a single NetCDF file. A new ESMValTool recipe and diagnostic process and evaluate the model output against the observational AOD dataset at model grid cells co-located with the AeroNET stations. Model output is processed in the recipe using available pre-processers to generate multi-annual seasonal means. The observational AOD timeseries from the AeroNET stations are processed in the diagnostic to generate multi-annual seasonal means, or ‘climatologies’. Because the AOD timeseries from the AeroNET stations can be incomplete, filtering criteria are applied to the data from each station to ensure sufficient temporal coverage according to the user’s requirements. We evaluate AOD at 440mn simulated by CMIP6 historical ensemble members against the AOD climatologies using the new ESMValTool metric. We also demonstrate how changing the filtering criteria can modify the observational climatologies, and thus the evaluation metrics. The new method extends atmospheric composition evaluation in the ESMValTool framework by adding a key aerosol metric. We hope that the techniques used to develop this metric can be applied to other point source observation datasets.

How to cite: Hardacre, C., Zimmermann, K., Löw, J., and Mulcahy, J.: A new ESMValTool metric using point source observations from AeroNET, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12905, https://doi.org/10.5194/egusphere-egu24-12905, 2024.

EGU24-16708 | Posters on site | CL5.6

Towards a coupled nitrogen cycle representation in NorESM – ocean biogeochemistry 

Joeran Maerz, Dirk Jan Leo Olivié, Tomas Torsvik, and Christoph Heinze

The nitrogen cycle is substantially anthropogenically perturbed with potential negative consequences on biogeochemical cycles, for the climate and society. Within the project ESM2025, we therefore aim at an improved representation and interactive, emission-driven nitrogen cycle in the Norwegian Earth system model (NorESM) to foster providing information about future societal impacts.

We here focus on the ocean biogeochemistry component iHAMOCC of NorESM, where major upgrades have been carried out with a particular focus on processes related to the highly potent greenhouse gas N2O. We included two more tracers, namely NH4+ and NO2-, which enabled an explicit representation of major canonical ocean nitrogen cycle pathways in both, the water column and the sediment. The in parallel substantially improved atmosphere chemistry of NorESM enabled us to realize the thus far technical capability to interactively couple air-sea N2O and NH3 fluxes as well as receiving atmospheric dry and wet deposition fluxes in the form of NHx and NOy. Concomitantly, interactive atmosphere-land N2O and nitrogen deposition fluxes were implemented, further increasing NorESMs capability for coupled nitrogen cycle simulations. The improved NorESM atmosphere and ocean component are currently individually in fine-tuning and spin-up phase in close preparation for first interactively coupled simulations.

Preliminary, partially still in transient ocean-only climatological atmosphere-forced simulations show a reasonable oceanic N2O emission pattern, also quantitatively close to recent reconstructions of Yang et al. 2020 (4.2 TgN/yr, doi:10.1073/pnas.1921914117), while the global ammonia emissions are at the lower end of current estimates (2-27 TgN/yr). With the current improved oceanic nitrogen cycle representation, N2O production during nitrification in well-ventilated areas is closely linked to primary production through subsequent decay and ammonification of organic nitrogen. By contrast, the transition zones of oxygen deficit zones (ODZs) entail microbial key processes of both aerobic and anaerobic N2O production and anaerobic N2O consumption, making those regions to hotspots of nitrogen cycling relevant to N2O. For the sediments, productive ocean and shelf regions feature higher N2O sediment-water column fluxes per unit area than deep sea regions, in line with current observational knowledge. In total, however, the sediments globally contribute significantly less to N2O production than the water column. In brief, future evolution of export fluxes and ODZs can hence be expected to determine the oceanic N2O release in response to ongoing climate change.

With the recent developments in NorESM, we increased the representation of nitrogen cycle-relevant processes and enhanced the thus far technical capability to simulate the nitrogen cycle emission-driven and interactively coupled across major Earth system components, while envisaging to also increase NorESMs land-ocean nitrogen transport representation.

How to cite: Maerz, J., Olivié, D. J. L., Torsvik, T., and Heinze, C.: Towards a coupled nitrogen cycle representation in NorESM – ocean biogeochemistry, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16708, https://doi.org/10.5194/egusphere-egu24-16708, 2024.

Simple climate models provide a flexible, computationally cost-effective way to emulate the more complex and higher resolution earth system models. As ESMs are improved, adding in new processes that weren't explicitly included before, so too can the simple climate models be refined to reflect changes to our understanding of the climate response to changing emissions. New developments in modelling of peatlands, wetlands, permafrost, and negative emissions scenarios have provided new data to test the simple models MAGICC and FAIR. By comparing their projections under the same scenarios used by more complex models, the reduced complexity models' limitations and uncertainties can be shown, and thus they can be improved to better capture the new knowledge. Here, we focus on peatlands, comparing the results of a new module in the model OSCAR with the current output from MAGICC and FAIR, quantifying the impact that explicit peatland processes have on global temperature. Negative emissions scenarios are also considered, all as part of a broader project to understand overshoot pathways, scenarios in which the global temperature anomaly exceeds 1.5°C but returns to a temperature below that mark. These results will show the value and capability of the simple climate models as they continue to be refined to emulate the larger models.

How to cite: Sloughter, T. and Rogelj, J.: Understanding uncertainties in the global Earth climate response with reduced complexity climate models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19283, https://doi.org/10.5194/egusphere-egu24-19283, 2024.

EGU24-20894 | Posters on site | CL5.6

Methodological Developments in the International Land Model Benchmarking Effort 

Nathan Collier, Forrest Hoffman, and Dave Lawrence

As earth system models (ESMs) become increasingly complex, there is a growing need for comprehensive and multi-faceted evaluation of model performance. The International Land Model Benchmarking (ILAMB) project is a model-data intercomparison and integration project designed to improve the performance of land models and improve the design of new measurement campaigns to reduce uncertainties associated with key land surface processes. While the effort has been established for over a decade, we continue to make developments and improvements in order to incorporate new datasets as well as adapt our scoring methodology to be more useful for model developers in identifying and diagnosing model errors.

The version 2.7 release of the ILAMB python software includes new datasets for gross primary productivity and latent/sensible heat flux (WECANN: Water, Energy, and Carbon with Artificial Neural Networks), growing season net carbon flux (Loechli2023), biomass (ESACCI: European Space Agency, Biomass Climate Change Initiative and XuSaatchi2021), methane (Fluxnet), and surface soil moisture (Wang2021). In addition to these land-focused datasets, ILAMB v2.7 marks the first release of the International Ocean Model Benchmarking (IOMB) package. While the codebase remains the same as is used with the land, this release encodes datasets and configuration file to be used in benchmarking ocean models.

Finally, we present a shift in the ILAMB scoring methodology. In order to make errors comparable across different areas of the globe, ILAMB originally employed a normalization of errors by the variability of the reference quantity in a particular location. For many variables, this choice tends to strongly weight performance in the tropics and consequently does not give a balanced perspective on model performance across the globe. To balance performance across the globe, we propose a shift to normalize errors by regional error quantiles. We select regions which conform to traditional understanding of biomes in order to focus on areas where we expect errors to be commensurate in the order of magnitude. We then choose a normalizing quantity by taking a quantile of the errors in these biomes across a selection of CMIP5 and CMIP6 era models. In this way, we contextualize the scores by using the performance of the recent generations of models.

How to cite: Collier, N., Hoffman, F., and Lawrence, D.: Methodological Developments in the International Land Model Benchmarking Effort, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20894, https://doi.org/10.5194/egusphere-egu24-20894, 2024.

EGU24-22316 | Orals | CL5.6 | Highlight

Climate Model Benchmarking for CMIP7 – A CMIP Task Team 

Forrest Hoffman and Birgit Hassler

The goal of the Coupled Model Intercomparison Project (CMIP) is to better understand past, present, and future climate changes in a multi-model context. Based on the outcomes of the phase 6 of CMIP (CMIP6) Community Survey, the CMIP panel is seeking to identify ways to increase the project's scientific and societal relevance, improve accessibility, and widen participation. To achieve these goals, a number of Task Teams were established to support the design, scope, and definition of the next phase of CMIP and the evolution of CMIP infrastructure and future operationalization.

An important prerequisite for providing reliable climate information from climate and Earth system models is to understand their capabilities and limitations. Thus, systematically and comprehensively evaluating the models with the best available observations and reanalysis data is essential. For CMIP7 new evaluation challenges stemming from models with higher resolution and enhanced complexity need to be rigorously addressed. These challenges are both technical (e.g., memory limits, increasingly unstructured and regional grids), and scientific. In particular, innovative diagnostics, including the support of machine learning-based analysis of CMIP simulations, must be developed.

The Climate Model Benchmarking Task Team aims to provide a vision and concrete guidance for establishing a systematic, open, and rapid performance assessment of the expected large number of models participating in CMIP7, including a variety of informative diagnostics and performance metrics. The goal is to fully integrate evaluation tools into the CMIP publication workflow, and their diagnostic outputs published alongside the model output on the ESGF, ideally displayed through an easily accessible website. To accomplish this, existing evaluation tools need to be further developed and applied to historical and other CMIP7 simulations. We expect to produce an increasingly systematic characterization of the models which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of simulation results. This will also reveal whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and a multitude of additional diagnostics and metrics as they become available from the research community, and will be developed as fully open-source software with high documentation standards.

How to cite: Hoffman, F. and Hassler, B.: Climate Model Benchmarking for CMIP7 – A CMIP Task Team, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22316, https://doi.org/10.5194/egusphere-egu24-22316, 2024.

ESSI3 – Open Science Informatics for Earth and Space Sciences

EGU24-271 | PICO | ESSI3.3

Open Science Collaboration across Eath Observation Platforms 

Ingo Simonis, Marie-Francoise Voidrot, Rachel Opitz, and Piotr Zaborowski
Collaborative Open Science is essential to addressing complex challenges whose solutions prioritize integrity and require cross-domain integrations. Today, building workflows, processes, and data flows across domains and sectors remains technically difficult and practically resource intensive, creating barriers to whole-systems change. While organizations increasingly aim to demonstrate accountability, they often lack the tools to take action effectively. By making it simple to connect data and platforms together in transparent, reusable and reproducible workflows, the OGC Open Science Persistent Demonstrator (OSPD) aims to enable responsible innovation through collaborative open science. The OSPD focuses specifically on using geospatial and earth observation (EO) data to enable and demonstrate solutions that create capacity for novel research and accelerate the practical implementation of this research.
Collaborative Open Science and FAIR (Findable, Accessible, Interoperable, Reusable) data are widely recognized as critical tools for taking advantage of the opportunities created through addressing complex social and environmental challenges. To date, many millions have been invested in hundreds of initiatives to enable access to analytical tools, provide data management, data integration and exchange, translate research results, and support reproduction and testing of workflows for new applications. These investments have resulted in a plethora of new data, protocols, tools and workflows, but these resources frequently remain siloed, difficult to use, and poorly understood, and as a result they are falling short of their full potential for wider impact and their long term value is limited.
This presentation will illustrate how the OGC OSPD Initiative, through its design, development and testing activities, provides answers to leading questions such as:
  • How can we design Open Science workflows that enable integration across platforms designed for diverse applications used in different domains to increase their value?
  • How can we lower barriers for end users (decision makers, managers in industry, scientists, community groups) who need to create Open Science workflows, processes, and data flows across domains and sectors remains technically difficult and practically resource intensive, creating?
  • How can Open Science workflows and platforms enable collaboration between stakeholders in different domains and sectors?
  • How can we empower organizations to demonstrate accountability in their analytical workflows, data, and representations of information through Open Science?
  • What Open Science tools do organizations need to take action effectively?
  • How can Open Science and FAIR data standards practically support accountability?
  • How can we make it simple to connect data and platforms together in transparent, reusable and reproducible (FAIR) workflows?
  • What are the specific challenges of using geospatial, earth observation (EO), and complementary data in this context?

How to cite: Simonis, I., Voidrot, M.-F., Opitz, R., and Zaborowski, P.: Open Science Collaboration across Eath Observation Platforms, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-271, https://doi.org/10.5194/egusphere-egu24-271, 2024.

EGU24-1445 | ECS | PICO | ESSI3.3

A Software Toolkit for Advancing our Understanding of Land Surface Interactions: Recent developments to the SimSphere SVAT model 

Christina Lekka, George P. Petropoulos, Vasileios Anagnostopoulos, Spyridon E. Detsikas, Petros Katsafados, and Efthimios Karympalis

Mathematical models are widely used today to study the intricate physical processes and interactions among the different components of the Earth’s system. Such models are often used synergistically with Earth Observation (EO) data allowing to derive spatiotemporal estimates of key parameters characterising land surface interactions. This synergy allows combining the horizontal coverage and spectral resolution of EO data with the vertical coverage and fine temporal continuity of those models. SimSphere is a mathematical model belonging to the Soil Vegetation Atmosphere Transfer (SVAT) models. As a software toolkit, it has been developed in Java and it is used either as a stand-alone application or synergistically with EO data. The model use is constantly expanding worldwide both as an educational and as a research tool. Herein we present recent advancements introduced to SimSphere. We have comprehensively tested and updated the model code and added new functionalities which are illustrated herein using a variety of case studies. For example, it presents herein the new functionality that allows it to be applied over complex/heterogeneous landscapes, and this new model capability is demonstrated in experimental settings in various European ecosystems. The present study contributes towards efforts ongoing nowadays by the users' community of the model and is also very timely, given the increasing interest in SimSphere particularly towards the development of EO-based operational products characterising the Earth’s water cycle.  The research presented herein has been conducted in the framework of the project LISTEN-EO (DeveLoping new awareness and Innovative toolS to support efficient waTer rEsources man-agement Exploiting geoinformatiOn technologies), funded by the Hellenic Foundation for Research and Innovation programme (ID 015898). 

Keywords: SVAT, SimSphere, Earth Observation, land surface interactions, LISTEN-EO

How to cite: Lekka, C., Petropoulos, G. P., Anagnostopoulos, V., Detsikas, S. E., Katsafados, P., and Karympalis, E.: A Software Toolkit for Advancing our Understanding of Land Surface Interactions: Recent developments to the SimSphere SVAT model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1445, https://doi.org/10.5194/egusphere-egu24-1445, 2024.

EGU24-1707 | PICO | ESSI3.3

Topic Analysis and Classification of EGU Conference Abstracts 

Jens Klump, Chau Nguyen, John Hille, and Michael Stewart

The corpus of Abstracts from the EGU General Assemblies 2000 - 2023 covers a wide range of Earth, planetary and space sciences topics, each with multiple subtopics. The abstracts are all in English, fairly uniform in length, cover one broad subject area, and are licenced under a permissive licence that allows further processing (CC BY 4.0), making this a high-quality text corpus for studies using natural language processing (NLP) and for the finetuning of Large Language Models (LLM). Our study makes use of openly available NLP software libraries and LLMs.

In the first phase of this study, we were interested in finding out how well abstracts map to the topics covered by EGU Divisions and whether co-organisation of sessions contributes to or dilutes topics. The abstracts are available only in unstructured formats such as Portable Document Format (PDF) or plain text in XML extracts from the conference database. They are identified by abstract numbers but carry no information on the session or division where they were originally presented. We reconstructed this information from the online conference programme.

To be able to employ a supervised learning approach of matching abstracts to topics, we defined the topics to be synonymous with the 23 scientific divisions of the EGU, using the division and co-listed divisions as topic labels.

We finetuned the Bidirectional Encoder Representations from Transformers (BERT) and the slightly simplified DistillBERT language models for our topic modelling exercise. We also compared the machine classifications against a random association of abstracts and topics. Preliminary results obtained from our experiments show that using a machine learning model performs well in classifying the conference abstracts (accuracy = 0.66). The accuracy varies between divisions (0.40 for NP to 0.96 for G) and improves when taking co-organisation between divisions into account. Starting from one year of abstracts (EGU 2015), we plan to expand our analysis to cover all abstracts from all EGU General Assemblies (EGU 2000 - 2024).

How to cite: Klump, J., Nguyen, C., Hille, J., and Stewart, M.: Topic Analysis and Classification of EGU Conference Abstracts, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1707, https://doi.org/10.5194/egusphere-egu24-1707, 2024.

EGU24-5658 | ECS | PICO | ESSI3.3

TropiDash: a comprehensive open-source dashboard for Tropical Cyclone data visualization and analysis 

Laura Paredes-Fortuny, Filippo Dainelli, and Paolo Colombo

Tropical Cyclones (TCs) are synoptic-scale storm systems rapidly rotating around a center of low atmospheric pressure which primarily derive their energy from exchanges of heat and moisture between the air and sea. These cyclones are among the most impactful geophysical phenomena, inflicting substantial economic damages and numerous fatalities. Key hazards associated with TCs include intense winds, extreme rainfall, and storm surges, which frequently result in extensive coastal flooding. Because of the severe consequences of their impacts, precise monitoring of these events and effective preparation for their occurrences are crucial to ensure the safety and resilience of populations and infrastructure.

 

For successful monitoring and preparation, the access to relevant factors associated with TC forecasts, such as risk projections and impact variables, must be adequate and user-friendly, enabling users to rapidly locate and comprehend the information they seek. To achieve this objective, visual tools and dashboards that concentrate interdisciplinary information and data from diverse sources serve as powerful summarization methods. Summary dashboards and tools facilitate easy access to information for all users ranging from experts and policymakers to common citizens. They consist of a platform offering a comprehensive overview of the situation, supporting informed decision-making. Current open-source tools for consulting TC data have limitations. They tend to be highly specialized, offering a limited selection of maps or graphs that cover only a portion of TC-related information. They also often lack interactivity, which restricts the user experience and the search for specific information. Furthermore, these tools can be complex to use due to inadequate documentation or challenges in presenting multiple pieces of information concurrently.

 

In this work, we introduce a novel free open-source dashboard designed to surpass the limitations of existing tools, displaying a comprehensive set of information regarding TC hazards. TropiDash presents several strengths that enhance user experience and accessibility. Developed in the widely recognized Jupyter Notebook programming environment, it is easily accessible either through the installation guide on its GitHub repository or by initiating its Binder environment. The dashboard features a user-friendly interface utilizing Python widgets and the Voilà protocol. It aggregates data from various sources spanning multiple domains: from cyclone properties, such as track forecasts and strike probability maps, to atmospheric variable fields (wind speed and direction, temperature, precipitation), to risk and vulnerability information, such as cyclone risk, coastal flood risk, population density. All this is made available while providing the user with a wide range of interactivity, from choosing the cyclone to selecting the variables of their interest to roam over the interactive maps.

 

The first version of TropiDash was realized in the context of Code for Earth 2023,  a program for the development of open-source software organized by the European Centre for Medium-Range Weather Forecasts. Here we present an improved and optimized version. 

How to cite: Paredes-Fortuny, L., Dainelli, F., and Colombo, P.: TropiDash: a comprehensive open-source dashboard for Tropical Cyclone data visualization and analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5658, https://doi.org/10.5194/egusphere-egu24-5658, 2024.

To initiate, maintain and accelerate behavioral change towards Open and FAIR practices, tangible benefits for scientific communities and especially early career scientists are a critical key success factor. The realization of such benefits, by due credit, funding, or other means requires underlying workflows, enabled by underlying infrastructures and standards, which are operational, reliable and trusted. Many education efforts are under way to educate and motivate researchers how to embrace and particpate in Open and FAIR efforts, including the open geospatial community software projects of the OSGeo foundation. Still, from the perspective of a developer of research software, the current general service quality of offerings for PID-/citation-based credit remains limited, fickle, partially unpredictable and frustrating. This presentation demonstrates these challenges by real world examples from OSGeo open geospatial projects, such as QGIS, GRASS GIS and proj and resulting PID-references in publications. Further, a service centered approach is introduced to enable both end users and Open/FAIR communities to assess the overall service quality through Technological Readiness Levels (TRL), to improve the user experience by building trust and to focus further development ressources.

How to cite: Löwe, P.: Open geospatial research software in 2024: Assessing service quality with technology readiness levels , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6254, https://doi.org/10.5194/egusphere-egu24-6254, 2024.

EGU24-8146 | ECS | PICO | ESSI3.3

Xarray-regrid: regridding with ease 

Bart Schilperoort, Claire Donnelly, Yang Liu, and Gerbrand Koren

In geosciences different sources of data are often on different grids. These can be at different resolutions, but also have the grid centers at different locations. To be able to use these different sources of data in models or analyses, they have to be re-projected to a common grid. Popular tools for this are the command-line tool ‘Climate Data Operators’ (CDO) and the Earth System Modeling Framework (ESMF).

These tools work well but have some downsides: CDO is a command-line tool and as such the regridded data has to be written to disk. ESMPy, the Python package for ESMF, is only available on Linux and Mac OSX, and does not support out-of-core computing. Both tools rely on binary dependencies, which can make them more difficult to install. Additionally, many geoscientists already use xarray for analyzing and processing (netCDF) data.

For this use case we developed xarray-regrid, a lightweight xarray plugin which can regrid (rectilinear) data using the linear, nearest-neighbor, cubic, and conservative methods. The code is open source and modularly designed to facilitate the addition of alternative methods. Xarray-regrid is fully implemented in Python and therefore can be used on any platform. Using Dask, the computation is fully parallelized and can be performed out-of-core. This allows for fast processing of large datasets without running out of memory.

Xarray-regrid is available on the Python Package Index (pip install xarray-regrid), and its source code is available on GitHub at https://github.com/EXCITED-CO2/xarray-regrid 

How to cite: Schilperoort, B., Donnelly, C., Liu, Y., and Koren, G.: Xarray-regrid: regridding with ease, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8146, https://doi.org/10.5194/egusphere-egu24-8146, 2024.

EGU24-12004 | PICO | ESSI3.3

NASA’s Open Science Platform VEDA (Visualization, Exploration and Data Analytics) 

Zac Deziel, Aimee Barciauskas, Jonas Solvsteen, Manil Maskey, Brian Freitag, Slesa Adhikari, Anthony Boyd, Alexandra Kirk, David Bitner, and Vincent Sarago

VEDA is an open-source science cyberinfrastructure for data processing, visualization, exploration, and geographic information systems (GIS) capabilities (https://www.earthdata.nasa.gov/esds/veda, https://www.earthdata.nasa.gov/dashboard/). NASA has always had open data policies, so data has always been openly accessible for anyone, but NASA hasn’t constantly exposed it in friendly interfaces or analytics platforms. VEDA attempts to make NASA’s Earth data mean more

As VEDA supplies data and computing services through its dashboard and JupyterHub applications and engages with communities such as EGU, it is a critical component of NASA’s open science initiative. VEDA’s adoption of existing and emerging standards such as STAC, Cloud-Optimized GeoTIFFs, Zarr, the Features API, and the Tiles API ensures interoperability and reusability.

In the past year, VEDA has expanded its impact in 3 ways: (1) the reuse of its infrastructure to stand up the multi-agency Greenhouse Gas Center (https://earth.gov/ghgcenter, announced at COP28) and NASA’s Earth Information Center (https://earth.gov/), (2) the reuse of data APIs across applications, such as VEDA data in NASA’s Enterprise GIS, and (3) the generalization of the data system architecture into a free and open source framework called eoAPI. 

VEDA has also maintained and deepened its connections to the Multi-Mission Algorithm and Analysis Platform (MAAP). MAAP is a research data infrastructure (RDI) for above-ground biomass estimation. MAAP is reusing and contributing to the eoAPI data system and plans to integrate the analytics components (JupyterHub and data processing system) further.

Now that VEDA has manifested GHG Center and EIC, VEDA is a project where innovation happens. The VEDA team, composed of NASA project leads, scientists, designers, and developers, constantly works to resolve old and new challenges in managing EO architectures. For example, the team designs and implements interfaces to manage STAC metadata. eoAPI is a result of this innovative environment.

eoAPI is a new, open-source, installable combination of data catalog and associated services for earth observation and related data with a cloud-computing infrastructure first approach. eoAPI combines STAC data ingestion, data hosting (pgSTAC), and querying services (stac-fastapi) with raster (Titiler) and vector services (TiPg). eoAPI is built for reuse and has been used beyond VEDA, GHG, and EIC to deliver MS Planetary Computer and AWS ASDI’s data catalog and applications for the International Federation of the Red Cross and MercyCorps.

This presentation will demo the current capabilities of eoAPI and VEDA and discuss how these capabilities were designed and architected with the central goals of science delivery, reproducible science, and interoperability to support the re-use of data and APIs across the Earth Science ecosystem of tools. The presentation will close with VEDA and eoAPI’s plans.

How to cite: Deziel, Z., Barciauskas, A., Solvsteen, J., Maskey, M., Freitag, B., Adhikari, S., Boyd, A., Kirk, A., Bitner, D., and Sarago, V.: NASA’s Open Science Platform VEDA (Visualization, Exploration and Data Analytics), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12004, https://doi.org/10.5194/egusphere-egu24-12004, 2024.

Rust Geodesy (RG) is an open source library, written in Rust [1], for experiments with geodetic transformations, software, and standards [2], [3]. RG originated from attempts to demonstrate architectural innovations for potential improvement of the ubiquitous transformation system PROJ, with which it consequentially shares many characteristics.
In parallel, however, RG has also evolved into a quite capable geodetic tool in its own right. And over the last few releases it has expanded from the "geometrical geodesy" background of PROJ, into supporting a number of operations from the realm of physical geodesy (deflections of the vertical, normal gravity models, gravity reduction, etc.), while still maintaining the key architectural feature of run time construction of complex operations from pipelines of simpler operators.
But in particular, the RG design has been nudged towards supporting the development and maintenance of geodetic transformations, as reflected by these characteristics:
  • A clear and compact syntax for specification of processing pipelines
  • ...but also syntactical backward compatibility and interoperability, through additional support for PROJ's older and more verbose syntax
  • Extensibility through straightforward, tight integration betwwen system supplied and user written operators
  • ..but also support for loose integration with arbitrary ancillary software, through support of plain text operator definitions and grid files
  • ...and ad-hoc abstractions through support for run-time defined user macros
  • Seamless interoperability with arbitrarily complex application program data structures, i.e. integrating with the user program, rather than forcing the use of library provided data structures, and
  • Support of roundtrip consistency checks
The RG data flow architecture is based on the foundational concept of "coordinate sets" from the OGC/ISO geospatial standards series [4]. Hence, in contrast to PROJ operators, which operate on a single coordinate tuple, RG operators operate on an entire set of coordinate tuples at a time. While this may seem immaterial at the source code level, it gives the compiler a wider context for introducing vectorisation, leveraging the SIMD instruction sets of modern computers to transform more than one coordinate tuple at a time.
Recently, SIMD-support has also arrived in the Web Assembly (Wasm) implementations of the major web platforms [5], and when compiled to Wasm, RG has shown to be a compact, lightweight and practical library for use on the web [6], [7]. So with RG's combined forays into the realms of Wasm and physical geodesy, the vista of "generic geodesy in the browser" is now more than just a mirage.
 
[1] Steve Klabnik and Carol Nichols, 2022: The Rust Programming Language, 2nd edition, 560 pp., San Francisco, CA, USA: No Starch Press
[2] Thomas Knudsen, 2021: Ruminations on Rust Geodesy: Overall architecture and philosophy.
URL: https://github.com/busstoptaktik/geodesy/blob/main/ruminations/000-rumination.md
[3] Thomas Knudsen: Geodesy. URL https://github.com/busstoptaktik/geodesy
[4] Roger Lott (ed), 2019: OGC Abstract Specification Topic 2: Referencing by coordinates.
URL https://docs.ogc.org/as/18-005r4/18-005r4.html
[5] WebAssembly Feature Extensions. URL: https://webassembly.org/features/
[6] Kyle Barron, 2023: Prototyping GeoRust + GeoArrow in WebAssembly. Efficient, vectorized geospatial operations in the browser,
URL https://observablehq.com/@kylebarron/prototyping-georust-geoarrow-in-webassembly
[7] Sean Rennie, 2023: Testing geodesy-wasm,
URL https://observablehq.com/d/3ff9d9b8f0b5168a

How to cite: Knudsen, T.: Generic geodesy in the browser? Recent developments in Rust Geodesy, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12024, https://doi.org/10.5194/egusphere-egu24-12024, 2024.

EGU24-12786 | ECS | PICO | ESSI3.3

Quantifying water security using hyperresolution hydrological modelling on top of an Open Data Cube (ODC) 

Luis Felipe Patino Velasquez, Dr. Elizabeth Lewis, Prof. Jon Mills, and Dr. Stephen Birkinshaw

For many areas across the globe physically-based hydrological models have a fundamental role helping devise a comprehensive and robust plan for future climate change adaption and preparedness informing water management and flood initiatives. Now that the advances in satellite and sensor technology coupled with the development of cloud computing have enable the advancement of hydrology as a data-intensive science, there is a considerable impetus and interest in future research and approaches in the use of these emerging technologies to develop new insights that contribute to fundamental aspects of the hydrological sciences. Whilst increasing volumes of Earth Observation (EO) data couple with advances in cloud computing have enable the enhancement of hydrological modelling, one of the remaining challenges is ensuring a seamless data pipeline to the final hydrological prediction. As a result, this poses a significant set of questions in the use of EO data for hydrology. The current research is situated at the junction of three areas: hydrological physical modelling, satellite EO data and the implementation of the Earth Observation Data Cube (EODC) paradigma. This presentation will outline the development and use of a open source modelling workflow integrating analysis ready data (ARD) through the implementation of the Open Data Cube (ODC) data exploitation architecture with a physically-based, spatially-distributed hydrological model (SHETRAN), as glimpse into the relevance of EO data cube solutions in lowering the technology and EO data barriers. Thus, enabling users to harnes existent open source EO datasets and software at minimum cost and effort with the objective to enable a more open and reproducible hydrological science.

How to cite: Patino Velasquez, L. F., Lewis, Dr. E., Mills, P. J., and Birkinshaw, Dr. S.: Quantifying water security using hyperresolution hydrological modelling on top of an Open Data Cube (ODC), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12786, https://doi.org/10.5194/egusphere-egu24-12786, 2024.

EGU24-15972 | PICO | ESSI3.3

The LUE open source software for building numerical simulation models on HPC 

Derek Karssenberg, Oliver Schmitz, and Kor de Jong

When developing large-scale numerical earth system models traditionally knowledge of a broad range of programming technologies is required to support hardware from laptops up to supercomputers. A knowledge that scientists specialized in a particular geosciences domain mostly do not have and often do not want to acquire. Their emphasis is on describing and implementing the processes rather than for instance dealing with parallelization of model equations. Moreover, when model characteristics or domain extents change their chosen parallelisation technique may already be obsolete or require significant refactoring to adapt to the new situation. We develop the open-source LUE modelling framework, a software environment allowing domain scientists – who may not be familiar with the development of high-performance applications – to develop numerical simulation models that seamlessly scale when adding additional hardware resources. LUE comprises of a data model for the storage of field-based and agent-based data, and provides a broad range of map algebra operations as building blocks for model construction. Each spatial operation is implemented in C++ using HPX, a library and runtime environment providing asynchronous execution of interdependent tasks on both shared-memory and distributed computing systems. LUE provides a Python module and therefore a single high-level API whether models are run on laptops or HPC systems. In our presentation we demonstrate two capabilities of LUE. First, using the built-in operations we implemented a spatially distributed hydrological model including surface water routing. The model runs for the African continent at 100 metres spatial and hourly temporal resolution. Secondly, to demonstrate the extensibility we utilise LUE’s focal operation framework to implement an individual kernel calculating greenness visibility exposure. Our PICO presentation will also include future extensions of the framework in particular for agent-based modelling and integration of machine learning model components.

How to cite: Karssenberg, D., Schmitz, O., and de Jong, K.: The LUE open source software for building numerical simulation models on HPC, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15972, https://doi.org/10.5194/egusphere-egu24-15972, 2024.

EGU24-19155 | PICO | ESSI3.3

FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project 

Mostafa Hadizadeh, Christof Lorenz, Sabine Barthlott, Romy Fösig, Katharina Loewe, Corinna Rebmann, Benjamin Ertl, Robert Ulrich, and Felix Bach

In the rapidly advancing domain of environmental research, the deployment of a comprehensive, state-of-the-art Research Data Management (RDM) framework is increasingly pivotal.  Such a framework is key to ensure FAIR data, laying the groundwork for transparent and reproducible earth system sciences.

Today, datasets associated with research articles are commonly published via prominent data repositories like Pangaea or Zenodo. Conversely, data used in actual day-to-day research and inter-institutional projects tends to be shared through basic cloud storage solutions or, even worse, via email. This practice, however, often conflicts with the FAIR principles, as much of this data ends up in private, restricted systems and local storage, limiting its broader accessibility and use.

In response to this challenge, our research project Cat4KIT aims to establish a cross-institutional catalog and Research Data Management framework. The Cat4KIT framework is, hence, an important building block towards the FAIRification of environmental data. It not only streamlines the process of ensuring availability and accessibility of large-scale environmental datasets but also significantly enhances their value for interdisciplinary research and informed decision-making in environmental policy.

The Cat4KIT system comprises four essential elements: data service provision, meta(data) harvesting, catalogue service, and user-friendly data presentation. The data service provision module is tailored to facilitate access to data within typical storage systems by using well-defined and standardized community interfaces via tools like the Thredds data server, Intake Catalogues, and the OGC SensorThings API. By this, we ensure seamless data retrieval and management for typical use-casers in environmental sciences.

(Meta)data harvesting via our so-called DS2STAC-package entails collecting metadata from various data services, followed by creating STAC-metadata and integrating it into our STAC-API-based catalog service.

This catalog service module synergizes diverse datasets into a cohesive, searchable spatial catalog, enhancing data discoverability and utility via our Cat4KIT UI.

Finally, our framework's data portal is tailored to elevate data accessibility and comprehensibility for a wide audience, including researchers, enabling them to efficiently search, filter, and navigate through data from decentralized research data infrastructures.

One notable characteristic of Cat4KIT is its dependence on open-source solutions and strict adherence to community standards. This guarantees not just the framework's ability to function well with current data systems but also its simple adaption and expansion to meet future needs. Our presentation demonstrates the technical structure of Cat4KIT, examining the development and integration of each module to adhere to the FAIR principles. Additionally, it showcases examples to illustrate the practical use of the framework in real-life situations, emphasizing its efficacy in enhancing data management practices within KIT and its potential relevance in other research organizations.

How to cite: Hadizadeh, M., Lorenz, C., Barthlott, S., Fösig, R., Loewe, K., Rebmann, C., Ertl, B., Ulrich, R., and Bach, F.: FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19155, https://doi.org/10.5194/egusphere-egu24-19155, 2024.

EGU24-19421 | PICO | ESSI3.3

Implementing National Copernicus services for hydrology and natural hazard monitoring at NVE using Open Source tools Apache Airflow and actinia 

Stefan Blumentrath, Yngve Are Antonsen, Aron Widforss, Niklas Fossli Gjersø, Rune Verpe Engeset, and Solveig Havstad Winsvold

The Norwegian Water Resources and Energy Directorate (NVE) is tasked with management of water- and energy resources in Norway, as well as reducing the risk of damages associated with landslides and flooding. Copernicus satellite data can provide valuable insight for those tasks.

The vast amount of Copernicus data however requires scalable and robust solutions for processing. Standardized and modular workflows help safeguarding maintainability and efficiency of service delivery. In order to implement operational Copernicus services at NVE, the Open Source OSGeo Community project actinia was introduced together with the Open Source Apache Airflow software as a platform for delivering operational Copernicus services at national scale.

actinia (https://actinia-org.github.io/) is a REST API for scalable, distributed, and high performance processing of time series of satellite images, as well as geographical raster and vector data. It is a modular system that uses mainly GRASS GIS for computational tasks.

Apache Airflow (https://airflow.apache.org/) is an orchestration solution that allows to programmatically author, schedule and monitor workflows.

In the presentation, we will illustrate how Apache Airflow and actinia work together and present selected examples of current and future applications operationalized on the platform. Those applications cover currently:

  • Avalanches
  • Flooding
  • snow cover
  • lake ice

More services related to NVE`s area of responsibility are being investigated, like landslides, slush flows, glacier lake outburst floods, and specific land cover changes...

Finally, we discuss challenges and opportunities of using Open Source Software tools and collaborative science approaches at NVE in national, operational services.

How to cite: Blumentrath, S., Are Antonsen, Y., Widforss, A., Fossli Gjersø, N., Verpe Engeset, R., and Havstad Winsvold, S.: Implementing National Copernicus services for hydrology and natural hazard monitoring at NVE using Open Source tools Apache Airflow and actinia, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19421, https://doi.org/10.5194/egusphere-egu24-19421, 2024.

EGU24-20137 | PICO | ESSI3.3

ALAMEDA – A scalable multi-domain metadata management platform 

Felix Mühlbauer, Martin Hammitzsch, Marc Hanisch, Gunnar Pruß, Rainer Häner, and Oliver Rach

Modern Earth sciences produce a continuous increasing amount of data. These data consist of the measurements/observations and descriptive information (metadata) and include semantic classifications (semantics). Depending on the geoscientific parameter, metadata are stored in a variety of different databases, standards and semantics, which is obstructive for interoperability in terms of limited data access and exchange, searchability and comparability. Examples of common data types with very different structure and metadata needs are maps, geochemical data derived from field samples, or time series data measured with a sensor at a point, such as precipitation or soil moisture.

So far, there is a large gap between the capabilities of databases to capture metadata and their practical use. ALAMEDA is designed as modular structured metadata management platform for curation, compilation, administration, visualization, storage and sharing of meta information of lab-, field- and modelling datasets. As a pilot application for stable isotope and soil moisture data ALAMEDA will enable to search, access and compare meta information across organization-, system- and domain boundaries.

ALAMEDA covers 5 major categories: observation & measurements, sample & data history, sensor & devices, methods & processing, environmental characteristics (spatio & temporal). These categories are hierarchically structured, interlinkable and filled with specific metadata attributes (e.g. name, data, location, methods for sample preparation, measuring and data processing, etc.). For the pilot, all meta information will be provided by existing and wellestablished data management tools (e.g. mDIS, SMS, LI2, etc.).

In ALAMEDA, all information is brought together and will be available via web interfaces. Furthermore, the project focuses on features such as metadata curation with intuitive graphical user interfaces, the adoption of well-established standards, the use of domain-controlled vocabularies and the provision of interfaces for a standards-based dissemination of aggregated information. Finally, ALAMEDA should be integrated into the DataHub (Hub-Terra).

Currently the project is in the final phase and we want to present the developed concepts and software and lessions learned.

How to cite: Mühlbauer, F., Hammitzsch, M., Hanisch, M., Pruß, G., Häner, R., and Rach, O.: ALAMEDA – A scalable multi-domain metadata management platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20137, https://doi.org/10.5194/egusphere-egu24-20137, 2024.

Geospatial users have long been constructing immersive 3D environments for diverse applications such as urban planning, environmental and geological studies, 3D analysis, and more recently, by replicating the physical world as a digital twin. In this pico presentation, we aim to illustrate the dynamic evolution of Indexed 3D Scene Layers (I3S), an OGC Community Standard designed for an efficient streaming and storage of substantial geospatial content. I3S has rapidly adapted to encompass new use cases and techniques, pushing the boundaries of geospatial visualization and analysis.

I3S facilitates the efficient transmission of diverse 3D geospatial data types, ranging from discrete 3D objects with attributes and integrated surface meshes to extensive point cloud data covering expansive geographic regions. Moreover, it excels in streaming highly detailed Building Information Model (BIM) content to web browsers, mobile applications, and desktop platforms.

The most recent enhancement to OGC's I3S streaming standard, Building Scene Layer (BSL), introduces a sophisticated framework for effective tiling of massive BIM content. BSL leverages Bounding Volume Hierarchy (BVH) and geometric error driven selection and display criteria, incorporates attribute-driven filtering, and employs various graphics optimizations. These advancements collectively enable the seamless streaming of otherwise voluminous Building Information Model (BIM) 3D assets.

During this session, we will spotlight the practical implementation of I3S BSL across diverse ecosystems, including loaders.gl and CesiumJS. This flexibility empowers users to select their preferred front-end application based on specific requirements and preferences.

How to cite: Belayneh, T.: Democratizing BIM Data Access in Digital Twins Through OGC I3S 3D Streaming Standard, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20842, https://doi.org/10.5194/egusphere-egu24-20842, 2024.

EGU24-1555 | ECS | Orals | ESSI3.5

eLTER and its role of providing in-situ data to large scale research projects for modelling biodiversity dynamics 

Christoph Wohner, Alessandro Oggioni, Paolo Tagliolato, Franziska Taubert, Thomas Banitz, Sarah Venier, Philip Trembath, and Johannes Peterseil

The integrated European Long-Term Ecosystem, critical zone and socio-ecological Research (eLTER) is an emerging pan-European, in-situ Research Infrastructure (RI). Once fully established, it will serve multiple scientific communities with high-level central facilities and distributed well-instrumented eLTER sites. In the Horizon Europe project Biodiversity Digital Twin (BioDT), eLTER already plays the role of a provider for European datasets, in particular for the Grassland Dynamics prototype digital twin. Here, GRASSMIND, an individual- and process-based grassland model designed for simulating the structure and dynamics of species-rich herbaceous communities, including these communities’ responses to climate and management, is to be upscaled to model different local grassland sites across Europe. As the eLTER in-situ site network also comprises such grassland sites, the site registry DEIMS-SDR (deims.org) was used to identify relevant sites and contact the respective site managers and researchers to mobilise data. This selection process was aided by the machine-actionable data endpoints of eLTER also accessible using the Python and R packages, deimsPy and ReLTER, enabling script-based extraction and analysis. Collected and mobilised data is to be published on the persistent data storage B2Share and made centrally accessible through the eLTER central data node. Metadata about the resources is also available in RDF format, making them interlinked and accessible via a SPARQL endpoint. 

The data provided will enable stronger validation and improvements of the grassland simulations, and thus to better scientific insights and grassland management recommendations.

How to cite: Wohner, C., Oggioni, A., Tagliolato, P., Taubert, F., Banitz, T., Venier, S., Trembath, P., and Peterseil, J.: eLTER and its role of providing in-situ data to large scale research projects for modelling biodiversity dynamics, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1555, https://doi.org/10.5194/egusphere-egu24-1555, 2024.

EGU24-1724 | ECS | Orals | ESSI3.5

Improving the Findability of Digital Objects in Climate Science by adopting the FDO concept 

Marco Kulüke, Karsten Peters-von Gehlen, and Ivonne Anders

Climate science relies heavily on the effective creation, management, sharing, and analysis of massive and diverse datasets. As the digital landscape evolves, there is a growing need to establish a framework that ensures FAIRness in handling climate science digital objects. Especially, the machine-to-machine actionability of digital objects will be a crucial step towards future AI assisted workflows. Motivated by a use case, this contribution proposes the adoption of the Fair Digital Object (FDO) concept to address the challenges associated with the emerging spread in interdisciplinary reuse scenarios of climate model simulation output.

FDOs are encapsulations of data and their metadata made accessible via persistent identifiers (PIDs) in a way that data and their context will remain a complete unit as FDO travels through cyberspace and time. They represent a paradigm shift in data management, emphasizing the machine-actionability principles of FAIRness and the requirements enabling cross-disciplinary research. The FDO concept can be applied to various digital objects, including data, documents and software within different research disciplines and industry areas.

The aim of this work is to commit to an FDO standard in climate science that enables standardized and therefore automated data analysis workflows and facilitates the extraction and analysis of relevant weather and climate data by all stakeholders involved. The current work  expands on the efforts made to enable broad reuse of CMIP6 climate model data and focuses on requirements identified to enable automated processing of climate simulation output and their possible implementation strategies. The exemplary use case of an automated, prototypical climate model data analysis workflow will showcase the obstacles occuring when analyzing currently available climate model data. In particular, the findability of digital objects required for a particular research question in climate science or a related field shows to be challenging. In order to mitigate this issue, we propose certain strategies: (1) Enriching the PID profiles of climate model data in accordance with the FDO concept and taking into account the needs of the climate science community will lead to improved findability of digital objects, especially for machines. (2) Defining a standardized, unique association between climate model variables and their meaningful long names will increase the findability of climate model data, especially for researchers in other disciplines. (3) Furthermore, combining the FDO concept with existing data management solutions, such as the intake-esm catalogs, can lead to improved data handling in line with prevailing community practices.

Eventually, implementing an FDO standard will benefit the climate science community in several ways: The reusability of the data will facilitate the cost-effective use of existing computationally expensive climate model data. Improved data citation practices will promote data sharing, and ultimately, high transparency will increase the reproducibility of research workflows and consolidate scientific results.

How to cite: Kulüke, M., Peters-von Gehlen, K., and Anders, I.: Improving the Findability of Digital Objects in Climate Science by adopting the FDO concept, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1724, https://doi.org/10.5194/egusphere-egu24-1724, 2024.

EGU24-3441 | Orals | ESSI3.5

How EOSC became our best ally? 

Anne Fouilloux

In this presentation, we will share firsthand experiences and insights gained from navigating the EOSC (European Open Science Cloud), offering a glimpse into how EOSC influences our day-to-day work and how it has become an invaluable ally for our team. We belong to the Nordic e-Infrastructure Collaboration on Earth System Modeling Tools (NICEST), a small community composed of researchers, Research Software Engineers (RSEs), and engineers from Norway, Sweden, Finland, Denmark, and Estonia, working in different organisations such as national meteorological services, national compute/storage infrastructure providers and support services, universities and other research institutes either working directly on climate or supporting related activities. The NICEST community strengthens the Nordic position in climate modelling by addressing e-infra challenges, leveraging Earth System Models (ESMs) to understand climate processes, adapt to global change, and mitigate impacts.

Our presentation extends beyond the technical aspects, offering a narrative of collaborative discovery that illustrates how EOSC has transformed into an indispensable companion, enabling our team to embody the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) and open science. Throughout the session, we will highlight the operational intricacies of frameworks like EOSC, emphasising our nuanced approach to leveraging these frameworks for maximum impact.

This personal narrative is not just about success stories; it explores the challenges we've faced and the lessons we've learned. We place a special emphasis on our evolving understanding of effectively exploiting specific EOSC services, transforming it into more than just infrastructure but a trusted friend in our professional lives.

As we reflect on our collaborative journey, we'll share stories of triumphs, challenges, and the unique bond that has developed between our team and those contributing to EOSC's development. We'll explain how we've moved from being mere users to active contributors, contributing to the deployment of our own services to serve our community. Today, our aim is to actively participate in the construction of EOSC, demonstrating that through collaboration and co-design, we can significantly contribute to its ongoing evolution. This collaboration becomes truly effective when each actor recognizes the value of others, enabling us to pool efforts and enhance efficiency together.



How to cite: Fouilloux, A.: How EOSC became our best ally?, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3441, https://doi.org/10.5194/egusphere-egu24-3441, 2024.

EGU24-3937 | Orals | ESSI3.5

New horizons for the Data Store Infrastructure at ECMWF 

Angel Lopez Alos, Baudouin raoult, Ricardo correa, Andre obregon, Chris stewart, James varndell, Edward comyn-platt, Eduardo damasio da-costa, and Marcus Zanacchi

Since its official launch in 2018 supporting the implementation of the Copernicus Climate Change Service (C3S), the Climate Data Store (CDS) software infrastructure has evolved in many ways driven by an expanding catalogue of resources, a growing user community and the evolution of technologies and standards. On 2020 a twin instance, the Atmosphere Data Store (ADS), as support of the Atmosphere Monitoring Service (CAMS) was released. Since then, Infrastructure was renamed as Climate and Atmosphere Data Store (CADS). Combined, CDS and ADS, provide nowadays service to more than 270k registered users, delivering over 130 TBs of data on daily average in the form of more than 700k processed requests.

In 2024, a modernized CADS will take over. A configurable framework built on cloud oriented and state-of-the-art technologies providing more scalable, wider, and open access to data and services which will foster the engagement with a broader user community and will facilitate interaction with different platforms in the future EU Green Deal Data Space.

Despite changes, CADS foundational principles of simplicity and consistency remains along with FAIR. A rigorous content management methodology is at the core of the system, supported by automatic deployment tools and configuration files that range from web portal content to metadata, interactive forms, dynamic constraints, documentation, adaptors, and quality control. This versatile mechanism provides huge flexibility for adaptation to different standards and FAIR principles. 

In addition to improved capabilities for discovery, search and retrieve, the modernized system brings new or re-engineered components aiming to improve the usability of resources,  such as compliant OGC APIs, integrated and interactive Evaluation and Quality Control (EQC) function, open-source expert python packages (earthkit) for climate and meteorological purposes able to deploy & run anywhere, or Serverless Analysis-Ready Cloud Optimized (ARCO) Data and Metadata Services supporting responsive WMS/WMTS interfaces.

Modernization also involves the underlaying Cloud Infrastructure which aligned with the ECMWF’s Strategy for a Common Cloud Infrastructure (CCI) brings extended compute and storage resources and more importantly, closer, and efficient access to ECMWF resources, data, and services.

All new capabilities combined power a new generation of interactive user applications, training material, EQCs functions, and efficient access mechanisms to large data volumes driven among others by ML/AI requirements.  

Here we describe the new horizons that the modernized Data Store infrastructure open to users, introduce the broad spectrum of functionalities, open-source code, and material currently available and we open for debate the expectations and requirements that will foster the future evolution of the different components of the infrastructure.

How to cite: Lopez Alos, A., raoult, B., correa, R., obregon, A., stewart, C., varndell, J., comyn-platt, E., damasio da-costa, E., and Zanacchi, M.: New horizons for the Data Store Infrastructure at ECMWF, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3937, https://doi.org/10.5194/egusphere-egu24-3937, 2024.

EGU24-5989 | Orals | ESSI3.5

The EPOS open source platform for multidisciplinary data integration and data analysis in solid Earth science 

Daniele Bailo, Rossana Paciello, Helen Glaves, Jean-Baptiste Roquencourt, Jakob Molander, Alessandro Spinuso, Tor Langeland, Jan Michalek, Otto Lange, Agata Sangianantoni, Carine Bruyninx, and Carmela Freda and the EPOS Group

Established as a European Research Infrastructure Consortium (ERIC)  in 2018, the European Plate Observing System (EPOS) Research Infrastructure represents a significant advancement in solid Earth sciences. Its aim is to harmonize and integrate data, services, and computational resources across diverse solid Earth science domains. These include Seismology, Near-Fault Observatories, GNSS Data and Products, Volcano Observations, Satellite Data, Geomagnetic Observations, Anthropogenic Hazards, Geological Information and Modeling, Multi-Scale Laboratories, Tsunami Research, each leveraging EPOS for the integration of domain specific data and services into a wider European multi-disciplinary context.

The EPOS platform ( https://www.epos-eu.org/dataportal) provides access to harmonized and quality-controlled data from thematic solid Earth science services through over 250 interoperable multidisciplinary services. The platform adopts a microservice-based architecture serving RESTful APIs, ensuring seamless interoperability between thematic core services (TCS) and the integrated core services central hub (ICS-C). The ICS-C, as the central system underpinning the EPOS platform, enables interoperability by adopting a multidimensional approach using metadata, semantics, and web services. Released under a GPL license as open-source software (https://epos-eu.github.io/epos-open-source/), EPOS adheres to the FAIR Principles, fostering interdisciplinary collaboration and technological advancement in Earth sciences and beyond.

In addition to data access, the EPOS platform also integrates complementary visualization tools and computational services. These Integrated Core Services - Distributed (ICS-D) enhance the user experience by simplifying complex interactions, offering functionalities like visualization, coding, and processing for data analysis, including machine learning applications.

This presentation will explore how the EPOS platform facilitates the entire research data lifecycle, connecting integrated multidisciplinary data provision to remote data analysis environments. By leveraging third-party cloud and supercomputing facilities equipped with specialized APIs (eg. SWIRRL https://gitlab.com/KNMI-OSS/swirrl/swirrl-api), we will demonstrate how EPOS seamlessly integrates with external services for reproducible data analysis and visualization, relying on common workflows to gather and pre-preprocess the data. External service examples include Jupyter Notebooks developed by domain-specific communities, using which the users can immediately analyze and process the data online. This adaptability streamlines scientific research and also promotes data reusability and collaboration within the portal, showcasing the EPOS platform's role in advancing Earth sciences research.

How to cite: Bailo, D., Paciello, R., Glaves, H., Roquencourt, J.-B., Molander, J., Spinuso, A., Langeland, T., Michalek, J., Lange, O., Sangianantoni, A., Bruyninx, C., and Freda, C. and the EPOS Group: The EPOS open source platform for multidisciplinary data integration and data analysis in solid Earth science, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5989, https://doi.org/10.5194/egusphere-egu24-5989, 2024.

EGU24-6798 | ECS | Posters on site | ESSI3.5

Novel environmental big data grid integration and interoperability model 

Daoye Zhu, Yuhong He, and Kent Moore

Currently, effectively managing, retrieving, and applying environmental big data (EBD) presents a considerable challenge owing to the abundant influx of heterogeneous, fragmented, and real-time information. The existing network domain name system lacks the spatial attribute mining necessary for handling EBD, while the geographic region name system proves inadequate in achieving EBD interoperability. EBD integration faces challenges arising from diverse sources and formats. Interoperability gaps hinder seamless collaboration among systems, impacting the efficiency of data analysis.

To address the need for unified organization of EBD, precise man-machine collaborative spatial cognition, and EBD interoperability, this paper introduces the EBD grid region name model based on the GeoSOT global subdivision grid framework (EGRN-GeoSOT). EGRN-GeoSOT effectively manages location identification codes from various sources, ensuring the independence of location identification while facilitating correlation, seamless integration, and spatial interoperability of EBD. The model comprises the grid integration method of EBD (GIGE) and the grid interoperability method of EBD (GIOE), providing an approach to enhance the organization and interoperability of diverse environmental datasets. By discretizing the Earth's surface into a uniform grid, GIGE enables standardized geospatial referencing, simplifying data integration from various sources. The integration process involves the aggregation of disparate environmental data types, including satellite imagery, sensor readings, and climate model outputs. GIGE creates a unified representation of the environment, allowing for a comprehensive understanding of complex interactions and patterns. GIOE ensures interoperability by providing a common spatial language, facilitating the fusion of heterogeneous environmental datasets. The multi-scale characteristic of GeoSOT allows for scalable adaptability to emerging environmental monitoring needs.

EGRN-GeoSOT establishes a standardized framework that enhances integration, promotes interoperability, and empowers collaborative environmental analysis. To verify the feasibility and retrieval efficiency of EGRN-GeoSOT, Oracle and PostgreSQL databases were combined and the retrieval efficiency and database capacity were compared with the corresponding spatial databases, Oracle Spatial and PostgreSQL + PostGIS, respectively. The experimental results showed that EGRN-GeoSOT not only ensures a reasonable capacity consumption of the database but also has higher retrieval efficiency for EBD.

How to cite: Zhu, D., He, Y., and Moore, K.: Novel environmental big data grid integration and interoperability model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6798, https://doi.org/10.5194/egusphere-egu24-6798, 2024.

EGU24-6810 | Posters on site | ESSI3.5

OpenAQ: Harmonizing Billions of Air Quality Measurements into an Open and FAIR Database 

Chris Hagerbaumer, Colleen Marciel Fontelera Rosales, Russ Biggs, and Gabe Fosse

OpenAQ is the largest open-source, open-access repository of air quality data in the world, integrating and hosting over 50 billion measurements from air monitors and sensors at more than 59,000 ground-level locations across 153 countries. The OpenAQ platform supports data on a variety of pollutants in different temporal frequencies. The platform is a one-stop solution for accessing air quality data in a consistent and harmonized format, thereby facilitating findability, accessibility, interoperability and reusability. OpenAQ utilizes modern cloud computing architectures and open-source data tools to maintain a highly scalable data pipeline, which can be resource- and computationally intensive, thus requiring thoughtful and efficient data management and engineering practices. Being an open-source platform that is grounded in community, OpenAQ strives to be transparent, responsible, user-focused, sustainable and technologically-driven.

OpenAQ supports innovation and collaboration in the air quality space by: 

  • Ingesting and sharing data on an open, low-bandwidth platform to ensure data is broadly accessible
  • Providing tools  to help  interpret the data and create visualizations for users with varied technical skills
  • Providing a user guide and trainings on how to use the OpenAQ platform for community-level pilot purposes and beyond
  • Catalyzing specific analyses through intentional outreach to a broad community of data stakeholders

OpenAQ has been widely used for research, informing nearly 300 scientific and data-oriented publications/proceedings. OpenAQ trainings and workshops around the world have resulted in community statements demanding increased coverage and frequency of air quality monitoring, the donation of air quality monitoring equipment to local communities, and adoption of APIs to make open-source city data available. As one example, our work with the Clean Air Catalyst supports pilots to clean the air in Jakarta (Indonesia), Indore (India) and Nairobi (Kenya). As another example, our Community Ambassador program trains emerging air quality leaders in low- and middle-income countries to utilize open data to spur community action to fight air pollution. 

Our poster describes how OpenAQ ingests and harmonizes heterogeneous air quality data at scale and how we conduct outreach to increase impactful usage of the hosted data.

How to cite: Hagerbaumer, C., Rosales, C. M. F., Biggs, R., and Fosse, G.: OpenAQ: Harmonizing Billions of Air Quality Measurements into an Open and FAIR Database, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6810, https://doi.org/10.5194/egusphere-egu24-6810, 2024.

EGU24-7290 | Posters on site | ESSI3.5

Blue-Cloud 2026, services to deliver, access and analyse FAIR & Open marine data 

Dick M. A. Schaap, Tjerk Krijger, Sara Pittonet, and Pasquale Pagano

The pilot Blue-Cloud project as part of ‘The Future of Seas and Oceans Flagship Initiative’ of EU HORIZON 2020 combined interests of developing a thematic marine EOSC cloud and serving the Blue Economy, Marine Environment and Marine Knowledge agendas. It deployed a versatile cyber platform with smart federation of multidisciplinary data repositories, analytical tools, and computing facilities in support of exploring and demonstrating the potential of cloud based open science for ocean sustainability, UN Decade of the Oceans, and G7 Future of the Oceans. The pilot Blue-Cloud delivered:

  • Blue-Cloud Data Discovery & Access service (DD&AS), federating key European data management infrastructures, to facilitate users in finding and retrieving multi-disciplinary datasets from multiple repositories
  • Blue-Cloud Virtual Research Environment infrastructure (VRE) providing a range of services and facilitating orchestration of computing and analytical services for constructing, hosting and operating Virtual Labs for specific applications
  • Five multi-disciplinary Blue-Cloud Virtual Labs (VLabs), configured with specific analytical workflows, targeting major scientific challenges, and serving as real-life Demonstrators, which can be adopted and adapted for other inputs and analyses.    

Since early 2023, Blue-Cloud 2026 aims at a further evolution into a Federated European Ecosystem to deliver FAIR & Open data and analytical services, instrumental for deepening research of oceans, EU seas, coastal & inland waters.

The DD&AS already federates leading Blue Data Infrastructures, such as EMODnet, SeaDataNet, Argo, EuroArgo, ICOS, SOCAT, EcoTaxa, ELIXIR-ENA, and EurOBIS, and facilitates common discovery and access to more than 10 million marine datasets for physics, chemistry, geology, bathymetry, biology, biodiversity, and genomics. It is fully based on machine-to-machine brokering interactions with web services as provided and operated by the Blue Data Infrastructures. As part of Blue-Cloud 2026 it will expand by federating more leading European Aquatic Data Infrastructures, work on improving the FAIRness of the underpinning web services, incorporating semantic brokering, and adding data subsetting query services.

The Blue-Cloud VRE, powered by D4Science, facilitates collaborative research offering computing, storage, analytical, and generic services for constructing, hosting and operating analytical workflows for specific applications. Blue-Cloud 2026 will expand the VRE by federating multiple e-infrastructures as provided EGI, Copernicus WEkEO, and EUDAT. This way, it will also open the connectivity to applications as developed in other EU projects such as iMAGINE (AI applications for marine domain), and EGI-ACE (applications for ocean use cases).

During EGU we will share insight in the solutions regarding semantics supporting interoperability and harmonised data access. This will be especially illustrated via developments of new Blue-Cloud analytical Big Data “WorkBenches” that are generating harmonised and validated data collections of Essential Ocean Variables (EOVs) in physics (temperature and salinity), chemistry (nutrients, chlorophyll, oxygen) and biology (plankton taxonomy, functions and biomass). The access to harmonised subsets of the BDI’s data collections will be supported by new tools like BEACON and the I-Adopt framework. The EOV collections are highly relevant for analysing the state of the environment.  This way, Blue-Cloud 2026 will provide a core data service for the Digital Twin of the Ocean, EMODnet, Copernicus, and various research communities.

How to cite: Schaap, D. M. A., Krijger, T., Pittonet, S., and Pagano, P.: Blue-Cloud 2026, services to deliver, access and analyse FAIR & Open marine data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7290, https://doi.org/10.5194/egusphere-egu24-7290, 2024.

EGU24-7391 | Posters on site | ESSI3.5

Fostering cross-disciplinary research - Training, workshops and summer schools of Geo-INQUIRE EU-project 

Mariusz Majdanski, Iris Christadler, Giuseppe Puglisi, Jan Michalek, Stefanie Weege, Fabrice Cotton, Angelo Strollo, Mateus Prestes, Helle Pedersen, Laurentiu Danciu, Marc Urvois, Stefano Lorito, Daniele Bailo, Otto Lange, and Gaetano Festa

The Geo-INQUIRE (Geosphere INfrastructure for QUestions into Integrated REsearch) project, supported by the Horizon Europe Programme, is aimed at enhancing the Earth Sciences Research Infrastructures and services to make data and high-level products accessible to the broad Geoscience scientific community. Geo-INQUIRE’s goal is to encourage curiosity-driven studies into understanding the Geosystem processes at the interface between the solid Earth, the oceans and the atmosphere using big data collections, high-performance computing methods and cutting-edge facilities.

The project has a strong focus on supporting dynamic development of the actual use of research infrastructures. Training, networking, and community-building activities will be key to foster it. The methodology ensures empowering participation of both young and experienced researchers, also from often underrepresented communities, but also incorporates new and intersectional perspectives, while addressing current major environmental and economic challenges and fertilising curiosity-driven, cross-disciplinary research.

The project dissemination activities include a series of open online training and more specialised on-site workshops focused on data, data products and software solutions. Researchers, early-stage scientists, students are communities which will be able to explore the various fields of geosphere-related science, also not directly related to their field, with the possible connection through Research Infrastructures. Through lectures and use cases, we expect to show and teach them how to use data and information coming from cross-disciplinary RIs. We would like to increase the awareness of the capacity and capabilities of “other” RIs, as well as data integration and importance of FAIR principles. The training offer is constantly updated on the project web page www.geo-inquire.eu.

In addition, two summer schools will be organised, dedicated to cross-disciplinary interactions of solid Earth with marine science and with atmospheric physics. The first school will be organised in autumn 2024 in Gulf of Corinth (Greece), and the second one in autumn 2025 in Catania, Sicily (Italy).

The applications for training activities will be evaluated by a panel that reviews the technical and scientific feasibility of the proposed application project, ensuring equal opportunities and diversity in terms of gender, geographical distribution and career stage. The data and products generated during the Transnational Accesses to research facilities will be made available to the scientific community via the project strict adherence to FAIR principles.

Geo-INQUIRE is funded by the European Commission under project number 101058518 within the HORIZON-INFRA-2021-SERV-01 call.

How to cite: Majdanski, M., Christadler, I., Puglisi, G., Michalek, J., Weege, S., Cotton, F., Strollo, A., Prestes, M., Pedersen, H., Danciu, L., Urvois, M., Lorito, S., Bailo, D., Lange, O., and Festa, G.: Fostering cross-disciplinary research - Training, workshops and summer schools of Geo-INQUIRE EU-project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7391, https://doi.org/10.5194/egusphere-egu24-7391, 2024.

EGU24-8255 | ECS | Orals | ESSI3.5

The Castanhão EPISODE - the case study of Reservoir Induced Seismicity (RIS) in NE Brazil. 

Helena Ciechowska, Łukasz Rudziński, Beata Orlecka-Sikora, Alessandro Vuan, Anastasios Kostoglou, and Aderson Farias do Nascimento

The man-made alteration to the environment can become a source of seismic activity. The Castanhão region (Ceará, NE Brazil) can pose as an example of such. The Castanhão Reservoir was created as a result of dam construction over the Jaguaribe River, which triggered the occurrence of earthquake swarms on the site. 

In the following study, we aim to analyze the data and understand the seismic mechanism behind the seismic activity in the Castanhão region. Such study required an interdisciplinary approach employing data from various disciplines such as seismology, geology, geomechanics, and hydrology. The starting data set contains continuous waveforms recorded on 6 seismological stations from January to December 2010. The two detection algorithms were applied for earthquake detection. Initial detection was performed with the use of the STA/LTA algorithm, which allowed for the preparation of 53 templates with a good S/N ratio. Further, in the frequency range from 5 to 100 Hz, the input templates were used to match self-similar events to augment the initial catalog.

Due to the station coverage and low magnitude of the events, the detailed analysis of the quakes is performed on 187 events out of over 300 that were detected during PyMPA template matching. The localization was performed using Hypo71 software and analysis of mechanisms is done with the KiwiTool. 

The Castanhão EPISODE is planned to be made available on the EPISODES Platform of EPOS Thematic Core Services Anthropogenic Hazards.

How to cite: Ciechowska, H., Rudziński, Ł., Orlecka-Sikora, B., Vuan, A., Kostoglou, A., and Nascimento, A. F. D.: The Castanhão EPISODE - the case study of Reservoir Induced Seismicity (RIS) in NE Brazil., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8255, https://doi.org/10.5194/egusphere-egu24-8255, 2024.

EGU24-8465 | Orals | ESSI3.5

ENVRI-Hub-NEXT, the open-access platform of the environmental sciences community in Europe 

Ulrich Bundke, Daniele Bailo, Thierry Carval, Luca Cervone, Dario De Nart, Claudio Dema, Tiziana Ferrari, Andreas Petzold, Peter Thijsse, Alex Vermeulen, and Zhiming Zhao

Easy and fast access to reliable, long-term, and high-quality environmental data is fundamental for advancing our scientific understanding of the Earth system, including its complex feedback mechanisms, as well as for developing mitigation and adaptation strategies, for fact-based decision-making, and for the development of environment-friendly innovations. In response to the continuously growing demand for environmental scientific knowledge, the ESFRI-listed environmental research infrastructures (ENVRIs/RIs) in Europe have formed a strong community of principal producers and providers of environmental research data and services from the four subdomains of the Earth system (Atmosphere, Marine, Solid Earth and Biodiversity/Ecosystems) through the cluster projects ENVRI (2011-2014), ENVRIplus (2015-2019), and ENVRI-FAIR (2019-2023). The further integration of ENVRIs across the subdomains is considered critical for leveraging the full potential of the ENVRI cluster for integrated environmental research. This step will be taken by ENVRI-Hub NEXT.

To transform the challenging task of integrated Earth observation into a concept towards a global climate observation system, the World Meteorological Organisation (WMO) has specified a set of Essential Climate Variables (ECV) relevant for the continuous monitoring of the state of the climate. ECV datasets provide the empirical evidence needed to understand and predict the evolution of climate, guide mitigation and adaptation measures, assess risks, enable attribution of climatic events to the underlying causes, and underpin climate services. ENVRIs are critical for monitoring and understanding changes in ECVs, as has been identified by the ESFRI Strategy Working Group on Environment in their recent Landscape Analysis of the Environment Domain.

The recently finished cluster project ENVRI-FAIR has launched an open access hub for interdisciplinary environmental research assets utilising the European Open Science Cloud (EOSC). The ENVRI-Hub is designed as a federated system to harmonise subdomain- or RI-specific access platforms and offers a user-centered platform that simplifies the complexity and diversity of the ENVRI landscape while preserving the structure of the individual RIs needed to fulfil the requirements of their designated communities. Building on the ENVRI-Hub, ENVRI-Hub NEXT aims at creating a robust conceptual and technical framework that will empower the ENVRI Science Cluster to provide interdisciplinary services that enable cross-RI exploitation of data, guided by the science-based framework of ECVs.

This presentation will summarise the status of the ENVRI-HUB and the plans for ENVRI HUB-NEXT.

Acknowledgement:

ENVRI-HUB-NEXT has received funding from the European Union’s Horizon Europe Framework Programme under grant agreement No 101131141.

ENVRI-FAIR has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824068 101131141.

This work is only possible with the collaboration of the ENVRI-HUB-NEXT partners and thanks to the joint efforts of the whole ENVRI-Hub team.

How to cite: Bundke, U., Bailo, D., Carval, T., Cervone, L., De Nart, D., Dema, C., Ferrari, T., Petzold, A., Thijsse, P., Vermeulen, A., and Zhao, Z.: ENVRI-Hub-NEXT, the open-access platform of the environmental sciences community in Europe, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8465, https://doi.org/10.5194/egusphere-egu24-8465, 2024.

EGU24-9244 | Orals | ESSI3.5

Analysing open climate data - a case study using the MATLAB Integration for Jupyter on the ENES Data Space environment 

Kostas Leptokaropoulos, Shubo Chakrabarti, and Fabrizio Antonio

The increasing volume and complexity of Earth and environmental data requires an efficient, interdisciplinary collaboration between scientists and data providers. This can be achieved by utilising research infrastructures providing advanced e-services exploiting data integration and interoperability, seamless machine-to-machine data exchange and HPC/ cloud facilities.  

In this contribution we will present a case study of geodata import, analysis and visualization, carried out on the ENES Data Space (https://enesdataspace.vm.fedcloud.eu), a cloud-enabled data science environment for climate data analysis built on top of the European Open Science Cloud (EOSC) Compute Platform. After joining the service by using an institutional or social media account, the site users can launch JupyterLab where they have access to a personal workspace as well as compute resources, tools and ready-to-use climate datasets, comprising past data recordings and future projections, mainly from the CMIP (Coupled Model Intercomparison Project) international effort. In this example, global precipitation data from CMCC experiments will be used. The analysis will be carried out within the ENES workspace in two different ways:

First, we will launch MATLAB Online from a web browser directly from the ENES Data Space JupyterLab where a Live Script (.mlx) will import, filter, and manipulate the data, create maps, compare results and perform hypothesis testing to evaluate the statistical significance of different outcomes. Live Scripts are notebooks that allow clear communication of research methods and objectives, combining data, hyperlinks, text and code and can include UI (User Interface) tools for point-and-click data processing and visualization, without the need for advanced programming skills.

Second, we will demonstrate the same process running the MATLAB kernel from a Jupyter notebook (.ipynb) in the same JupyterLab.

In both cases results can be exported in multiple formats (e.g., PDF, markdown, LaTeX, etc.), downloaded and shared with other researchers, students, and fellow educators. The entire process is carried out in MATLAB within the ENES Data Space environment with no need to install software or download data on the users’ local (non-cloud) devices.

How to cite: Leptokaropoulos, K., Chakrabarti, S., and Antonio, F.: Analysing open climate data - a case study using the MATLAB Integration for Jupyter on the ENES Data Space environment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9244, https://doi.org/10.5194/egusphere-egu24-9244, 2024.

EGU24-9486 | ECS | Posters on site | ESSI3.5

Best practices for using and reporting subsurface geological/geophysical data in defining and documenting seismogenic faults. 

Giuseppe Vico, Rita Chiara Taccone, Francesco Emanuele Maesano, Mara Monica Tiberti, and Roberto Basili

Earthquakes of engineering significance (magnitude 5 and above) are generated by pre-existing, relatively mature geological faults. These faults generally span a length from a few to several tens or hundreds of kilometers and can break the entire Earth’s crust.   

Defining the three-dimensional configuration of such seismogenic faults is crucial for developing applications for earthquake hazard analyses at different spatial scales and, in turn, contributing robust information to promoting earthquake risk mitigation strategies.

The reconstruction of geological fault surfaces is a typical multidisciplinary study involving a large variety of data types and processing methods that, inevitably, imply various degrees of geometric simplifications depending on the available data. Among them, the most powerful, although expensive, approaches are the techniques developed for hydrocarbon exploration, namely seismic reflection (2D-3D) data combined with logs of drilled wells, which can illuminate the Earth’s subsurface at several kilometers depth. The mining and oil and gas industries have historically collected a large amount of this data, which remained classified depending on the regulations of the country from which they obtained the license for exploration. As time passes, and with the waning of fossil fuel exploitation, the exploration licenses expire or are not renovated, and more of such data becomes available to amalgamate with data collected by research institutions or public/private ventures using public funding. 

Despite the vast literature on and applications of hydrocarbon exploration data, no standard procedure exists for documenting the use of such data in characterizing seismogenic faults. In this respect, scientists face challenges posed by the intersection of industry data with public research outputs, with important societal implications and barriers to ensuring FAIRness. To this end, we devised a workflow detailing the best practices to follow in the various steps geologists undertake in using hydrocarbon exploration data, starting from the source of the raw/processed data (public vs confidential) and ending with the final geological fault model. The workflow output is then ready to be integrated with the information and data from other scientific disciplines (e.g., seismology, paleoseismology, tectonic geomorphology, geodesy, geomechanical modeling, earthquake statistics) to obtain the most reliable seismogenic fault model. As proof of concept, we will present a simplified version of a software tool that guides the user in incorporating the workflow's various elements into a structured database of seismogenic faults.

How to cite: Vico, G., Taccone, R. C., Maesano, F. E., Tiberti, M. M., and Basili, R.: Best practices for using and reporting subsurface geological/geophysical data in defining and documenting seismogenic faults., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9486, https://doi.org/10.5194/egusphere-egu24-9486, 2024.

EGU24-10536 | Posters on site | ESSI3.5

CDGP data center, new data for interdisciplinarity research  

Salsabyl Benlalam, Benoit Derode, Fabien Engels, Marc Grunberg, and Jean Schmittbuhl

The Data Center for Deep Geotermal Energy (CDGP-Centre de Données de Géothermie Profonde, https://cdgp.u-strasbg.fr/) was launched in 2016 and managed by Interdisciplinary Thematic Institute for Geosciences for the energy system Transition (ITI GeoT, https://geot.unistra.fr/), with the purpose of archiving, preserving and distributing deep geothermal data in the Alsace region (France) for the scientific community and R&D activities. The CDGP is furthermore an internal node of EPOS TCS Anthropogenic Hazards (https://www.epos-eu.org/tcs/anthropogenic-hazards), the data provided concerning geothermal sites in Alsace are therefore also available on the EPISODES platform (https://episodesplatform.eu/), which enables users to process and analyze the data they download. The CDGP collects high-quality data from different phases of deep geothermal projects, especially from exploration and development phases. The aim of this service is to provide downloadable multi-disciplinary data, ranging from industrial hydraulic information to seismic records and catalogs, through geological logs and fault maps for example. The data are thoroughly filtered, controlled and validated by analysts, and are grouped into “episodes”, referring to a set of relevant geophysical data correlated over time, and establishing links between anthropogenic seismicity and an industrial activity.

As part of the European Geo-INQUIRE project (GA n. 101058518, https://www.geo-inquire.eu/), we are now expanding the types of data that we distribute. The raw data (RINEX) from GNSS stations that are monitoring the surface deformation around geothermal site are now available on the website. In a next step, we will add complementary information and metadata to our provided database (e.g. precise position/velocity/strain) thanks to our collaboration with EPOS TCS GNSS. We are currently in the process of developing strategies with EPOS TCS GIM (Geological Information and Modeling)  to provide geological maps and borehole data for the “episodes” sites. The aim is to use the TCS GIM services currently under development and benefit of the synergy from the various leading projects.

Specific procedures have also been implemented since the beginning of the project to respect international requirements for data management. FAIR recommendations, for example, are followed to distribute data that are Findable, Accessible, Interoperable, and Reusable. 

How to cite: Benlalam, S., Derode, B., Engels, F., Grunberg, M., and Schmittbuhl, J.: CDGP data center, new data for interdisciplinarity research , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10536, https://doi.org/10.5194/egusphere-egu24-10536, 2024.

EGU24-10742 | Posters on site | ESSI3.5

EU-financed transnational access in Geo-INQUIRE: an opportunity for researchers to develop leading-edge science at selected test-beds and research facilities across Europe. 

Shane Murphy, Gaetano Festa, Stefano Lorito, Volker Röhling, Fabrice Cotton, Angelo Strollo, Marc Urvois, Andrey Babeyko, Daniele Bailo, Jan Michalek, Otto Lange, Javier Quinteros, Mariusz Majdanski, Iris Christadler, Mateus Prestes, and Stefanie Weege

The Geo-INQUIRE (Geosphere INfrastructure for QUestions into Integrated REsearch) project, supported by the Horizon Europe Programme, is aimed at enhancing services to make data and high-level products accessible to the broad Geoscience scientific community. Geo-INQUIRE’s goal is to encourage curiosity-driven studies into understanding the geosphere dynamics at the interface between the solid Earth, the oceans and the atmosphere using long data streams, high-performance computing and cutting-edge facilities. 

The Geo-INQUIRE Transnational Access (TA) covers both virtual and on-site access to a variety of state of the art laboratories, facilities, experimental sites (testbeds) and computational resources with the aim of enabling the development of excellent ground-breaking science. Six research infrastructures located across Europe, referred to as “testbeds”, will provide locations for users to perform experiments in a variety of environments from the Earth’s surface (both on land and at sea) to the subsurface; over different spatial scales: from small-scale experiments in laboratories to kilometric submarine fibre cables. These sites are: the Bedretto Laboratory (Switzerland); the Ella-Link Geolab (Portugal); the Liguria-Nice-Monaco submarine infrastructure (Italy/France); the Irpinia Near-Fault Observatory (Italy); the Eastern Sicily facility (Italy); and the Corinth Rift Laboratory (Greece). In addition, ECCSEL-ERIC is providing access to 5 of its research facilities focussing on CO2 Capture, Utilisation, Transport and Storage. The facilities providing access are: Svelvik CO2 Field Lab (Norway), PITOP Borehole Geophysical Test Site (Italy), Sotacarbo Fault Laboratory (Italy), Catenoy experimental site and gas-water-rock interaction Laboratory in Oise (France) and the Mobile Seismic Array (the Netherlands) which is fully mobile and can be deployed anywhere in the world. 

TA will be also offered for software and workflows belonging to the EPOS-ERIC and the ChEESE Centre of Excellence for Exascale in Solid Earth. These are grounded on simulation of seismic waves and rupture dynamics in complex media, tsunamis, subaerial and submarine landslides. HPC-based Probabilistic Tsunami, Seismic and Volcanic Hazard workflows are offered to assess hazard at high-resolution with extensive uncertainty exploration. Support and collaboration will be offered to the awardees to facilitate the access and usage of HPC resources for tackling geoscience problems. 

Geo-INQUIRE will grant TA to researchers to develop their own lab-based or numerical experiments with the aim of advancing scientific knowledge of Earth processes while fostering cross-disciplinary research across Europe. The data and products generated during the TAs will be made available to the scientific community via the project’s strict adherence to FAIR principles. 

To be granted, researchers submit a proposal to the TA calls that will be issued three times during the project life. The first call was launched on the 9th January. Calls will be advertised on the Geo-INQUIRE website https://www.geo-inquire.eu/ and through the existing community channels.

How to cite: Murphy, S., Festa, G., Lorito, S., Röhling, V., Cotton, F., Strollo, A., Urvois, M., Babeyko, A., Bailo, D., Michalek, J., Lange, O., Quinteros, J., Majdanski, M., Christadler, I., Prestes, M., and Weege, S.: EU-financed transnational access in Geo-INQUIRE: an opportunity for researchers to develop leading-edge science at selected test-beds and research facilities across Europe., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10742, https://doi.org/10.5194/egusphere-egu24-10742, 2024.

EGU24-11000 | Posters on site | ESSI3.5

Multidisciplinary integration of FAIR Research Infrastructures in the Geo-INQUIRE initiative: the EPOS – EMSO case 

Kety Giuliacci, Daniele Bailo, Jan Michalek, Rossana Paciello, Valerio Vinciarelli, Claudio Goffi, Angelo Strollo, Fabrice Cotton, Harald Nedrebø, Sven Peter Näsholm, Quentin Brissaud, Tina Kaschwich, Enoc Martinez, Aljaz Maslo, Volker Röhling, Olivier Frezot, Javier Quinteros, Kuvvet Atakan, and Wolfgang zu Castell

In the last decade, the scientific community has witnessed growing emphasis on data integration. The primary objective is to harness multidisciplinary data and resources to drive novel methodological approaches and scientific breakthroughs. Among the projects that have emerged in response to this trend is the Geosphere INfrastructures for QUestions into IntegratedREsearch (Geo-INQUIRE, https://www.geo-inquire.eu/).

Geo-INQUIRE was launched in October 2022 and comprises a unique consortium of 51 partners, including prominent national research institutes, universities, national geological surveys, and European consortia. Geo-INQUIRE is dedicated to surmounting cross-domain challenges, particularly those pertaining to land-sea-atmosphere environments. To accomplish this mission, Geo-INQUIRE is committed to consolidating the resources and capabilities of key research infrastructures (RIs) specializing in geosphere observations. These RIs include EPOS, EMSO, ARISE, ECCSEL, and ChEESE.

By providing access to its expanded collection of data, products, and services, Geo-INQUIRE empowers the upcoming generation of scientists to conduct cutting-edge research that addresses complex societal challenges from a multidisciplinary viewpoint. This encourages the utilization of these resources to foster curiosity-driven research endeavors.

To harmonize and prepare the data produced by these different RIs for integration, substantial efforts have been undertaken, which required cataloging all installations provided by the data providers, their analysis, and assessment concerning the maturity level required for FAIR (Findable, Accessible, Interoperabile, and Reusable) data integration. In addition, dedicated seminars focused on data integration were carried out to boost the FAIR data provision process. Technical activities have been carried out to achieve cross-RI integration. In this contribution, we demonstrate and exemplify one such integration: between EMSO (https://emso.eu/) and EPOS (https://www.epos-eu.org/). This has been achieved on multiple fronts, including metadata and services.

The successful integration of metadata and services was made possible by adopting the EPOS-DCAT Application Profile (https://epos-eu.github.io/EPOS-DCAT-AP/v3/), allowing an intelligent system like the EPOS platform (https://www.ics-c.epos-eu.org/) to access the EMSO services seamlessly. Work is currently underway to develop software that will enable the visualization of heterogeneous time series data from EMSO within the integrated framework, a crucial step to achieve full data integration.

How to cite: Giuliacci, K., Bailo, D., Michalek, J., Paciello, R., Vinciarelli, V., Goffi, C., Strollo, A., Cotton, F., Nedrebø, H., Näsholm, S. P., Brissaud, Q., Kaschwich, T., Martinez, E., Maslo, A., Röhling, V., Frezot, O., Quinteros, J., Atakan, K., and zu Castell, W.: Multidisciplinary integration of FAIR Research Infrastructures in the Geo-INQUIRE initiative: the EPOS – EMSO case, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11000, https://doi.org/10.5194/egusphere-egu24-11000, 2024.

EGU24-11659 | Posters virtual | ESSI3.5

Standardization of geospatial analysis ready data via OGC and ISO  

Liping Di, Eugene Yu, Liying Guo, Patrick Quinn, and Joshua Lieberman

Geospatial data are data with location information. Geospatial data are very diverse and widely used in various socioeconomic applications and decision makings. Typically, geospatial data obtained from data providers needs to go through a long chain of pre-processes and quality measures before the data can be analyzed for a specific application. For a specific type of geospatial data, many of the pre-processes and quality measures are common to different data users regardless the data applications. It is possible to pre-apply those common pre-processes and quality measures to the geospatial data so that the repetitive preprocesses can be avoided, the pre-process chain at user side can be significantly shorten, and the data is more ready for analysis. The geospatial data, which has been pre-applied with a set of pre-processes to meet certain quality specifications and be ready for analysis in applications, are called geospatial analysis ready data (ARD). In the satellite remote sensing domain, the Committee on Earth Observation Satellites (CEOS) has defined the CEOS Analysis Ready Data (CEOS-ARD) as satellite remote sensing data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort and interoperability both through time and with other datasets. CEOS has set a number of ARD product family specifications (PFS) and encouraged its member space agencies to produce CEOS ARD PFS compliant products. However, CEOS ARD PFS are limited to satellite remote sensing data and are not the recognized international standards, which prevents them from being widely accepted and adopted by the broad geospatial community. Other geospatial communities, such as ARD.Zone, are also developing their ARD concepts.  Formal ARD standardization through authoritative international standard bodies is necessary to achieve broad uptake, particularly by the commercial sector, promote widely acceptance of the standardized concept, and help avoid the divergence that can be caused by various groups working towards different interpretations of the concept. Therefore, a joint effort between ISO TC 211 and the Open Geospatial Committee (OGC) was officially formed in May 2023 to set international ARD standards through forming the broadest consensus within the geospatial community. ISO has designated the geospatial ARD standards as ISO 19176, and the first one to be developed is ISO 19176-1: Geographic information —Analysis Ready Data — Part 1: Framework and Fundamentals. In addition, OGC, through its testbed and pilot initiatives, has been evaluating the applicability, advantage, and gaps of using existing geospatial ARD products from various sources in different applications. The findings and lessons learned from the evaluation are reinforcing the development of ISO 19176.  This presentation will report the progress so far on the development of ISO 19176-1 and recapture the findings from ARD activities in OGC Testbed 19. It will discuss the joint ISO/OGC ARD standard development process, the ISO 19176-1 development timeline, the ARD framework and UML models defined in ISO 19176-1, the findings from OGC Testbed 19 on ARDs, and the future workplan.

How to cite: Di, L., Yu, E., Guo, L., Quinn, P., and Lieberman, J.: Standardization of geospatial analysis ready data via OGC and ISO , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11659, https://doi.org/10.5194/egusphere-egu24-11659, 2024.

The use of airborne cloud imaging probes has resulted in decades of in situ particle-by-particle data taken across the gamut of pristine and anthropogenically-modified cloud types around the globe. Image data from such probes is recorded in proprietary and instrument- or system-specific formats. Binary formats have evolved to minimise the stress on, now possibly outdated, hardware and communication systems that must operate in the difficult aircraft environment. This means that there is a significant knowledge and technical barrier to new users, particularly for those that are not from fields that have traditionally used such cloud data. Processed image data is generally available, however this precludes the application of more advanced or specialised processing of the raw data. For example, historical cloud campaigns of the 1970s and 80s used imaging probes for cloud microphysical measurements at a time when satellite measurements of those regions were sparse or nonexistent. Fields such as atmospheric processes modelling, climate modelling, and remote sensing may well benefit by being able to ingest raw cloud particle data into their processing streams to use in new analyses and to address issues from a perspective not normally used by those in the cloud measurement community.

The Single Particle Image Format (SPIF) data standard has been designed to store decoded raw binary data in netCDF4 with a standardised vocabulary in accordance with FAIR Guiding Principles. This improves access to this data for users from a wide range of fields and facilitates the sharing, refinement, and standardisation of data processing routines. An example is the National Research Council of Canada (NRC) Single Particle Image Format (SPIF) conversion utility which converts binary data into SPIF files. In a similar fashion to  the Climate and Forecast (CF) Conventions, SPIF defines a minimum vocabulary (groups, variables, and attributes) that must be included for compliance while also allowing extra, non-conflicting data to be included. 

The ability to easily check files for compliance to a data standard or convention is an important component of building a sustainable and community supported data standard. We have developed a Python package called vocal as a tool for managing netCDF data product standard vocabularies and associated data product specifications. Vocal projects define standards for netCDF data, and consist of model definitions and associated validators. Vocal then provides a mapping from netCDF data to these models with the Python package pydantic being used for compliance checking of files against the standard definition. 

We will present the vocal package and the SPIF data standard to illustrate its use in building standard compliant files and compliance-checking of SPIF netCDF files.

How to cite: Nott, G. and Sproson, D.: An Open Data Standard for Cloud Particle Images and Reference Software to Produce and Validate Compliant Files, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11901, https://doi.org/10.5194/egusphere-egu24-11901, 2024.

EGU24-12102 | Posters virtual | ESSI3.5 | Highlight

A preliminary analysis of a crowdsourcing platform for participatory assessment of urban landscapes by university students using GIS 

Nikos Mamassis, Romanos Ioannidis, Christos Daskalakis, Fotis Loukidis-Andreou, Margarita Zakynthinou-Xanthi, Lucas Gicquel, Lucile Samah--Ribeiro, Filio Iliopoulou, G.-Fivos Sargentis, and Kontantinos Moraitis

The fields of information technology and geoinformatics have experienced rapid growth and widespread public adoption, with technologies like crowdsourcing facilitating advances in how the public can communicate with scientific communities and even contribute valuable data.

However, there is still hesitation in actively engaging the public in environmental or landscape related studies. The start contract of availability of crowdsourcing technologies and lack of use thereof is particularly noticeable in university education, where the technological potential of smartphones, widely owned and used by students, remains largely untapped for educational and research purposes. This study is part of a larger exploration of the potential of engaging students in participatory georeferenced landscape assessment, aiming to advance relevant environmental research and also make education in landscape and architecture more interactive and synergistic.

Starting from an initial theoretical investigation our work proceeded to the examination of the developed ideas in practice. A dedicated crowdsourcing mobile application was developed and tested as a pilot study with a small number of students, before proceeding to the inclusion of large numbers of students which is the end goal of the ARCHIMAP crowdsourcing project. This initial “test” targeted both potential practical challenges as well as software and generated-data related challenges. To this aim the Lycabettus hill and surrounding neighborhoods were investigated as a case study. Students were given the application and their interactions with it were recorded in detail, tracking their movement and location, recording their landscape and architecture assessments and evaluating the technical performance of the application.

Other than the observation of technical and functional challenges the study also initiated a brief investigation of the potential utility of the results. This was carried out by implementing a conventional method of analysis of landscapes, the so called ULQI (Urban Landscape Quality Index) and investigating its correlation and potential synergy with the results submitted by the students through the novel crowdsourcing app for georeferenced landscape assessment.

The results demonstrated that the developed app was both functional and useful and could therefore be shared to more students of NTUA, with expected benefits both for the educational processes but also for the scientific research of the institution on landscape quality.

How to cite: Mamassis, N., Ioannidis, R., Daskalakis, C., Loukidis-Andreou, F., Zakynthinou-Xanthi, M., Gicquel, L., Samah--Ribeiro, L., Iliopoulou, F., Sargentis, G.-F., and Moraitis, K.: A preliminary analysis of a crowdsourcing platform for participatory assessment of urban landscapes by university students using GIS, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12102, https://doi.org/10.5194/egusphere-egu24-12102, 2024.

EGU24-12230 | Posters on site | ESSI3.5

Semantic Interoperability Profiles as knowledge base for semantic solutions 

Barbara Magagna, Marek Suchánek, and Tobias Kuhn

Central for research is the capability to build on existing research outcomes and to aggregate data from different sources to create new research findings. This is particularly true for environmental research, which tries to face global challenges like climate change and biodiversity loss by integrating diverse long-term monitoring and experimental data.

Interoperability is the ability of computer systems to exchange information but to get a shared understanding of the meaning of that information semantic interoperability is required. Shared understanding between all parties involved can be achieved using common standards like vocabularies, metadata and semantic models.

But how can researchers find out which standards are used and by whom? FAIR Implementation Profiles (FIPs), co-developed by GO FAIR Foundation and ENVRI-FAIR in 2020 (https://doi.org/10.1007/978-3-030-65847-2_13) and used by more than 120 communities so far like ENVRIs and WorldFAIR (see also https://fairdo.org/wg/fdo-fipp/), might be a good source of knowledge. This socio-technical approach drives explicit and systematic community agreements on the use of FAIR implementations including domain-relevant community standards, called FAIR-Enabling Resources. The FIP Wizard (https://fip-wizard.ds-wizard.org/) is implemented through the DSW open-source tool as a user interface by which the researcher is asked to answer questions related to each of the Principles by selecting FERs expressed as nanopublications. A nanopublication (https://nanopub.net/) is represented as a machine-interpretable knowledge graph and includes three elements: assertions, provenance, and publication info where in the context of FIPs the assertion contains essential metadata about a FER.

Using the same approach and technology but focusing on semantic interoperability aspects the Semantic Interoperability Profile (SIP) was developed in the context of the EOSC Semantic Interoperability Task Force to interview semantic or data management experts involved in research projects or infrastructures to collectively contribute to a knowledge base of interoperability solutions (https://doi.org/10.5281/zenodo.8102786). The SIP focuses on standards used to implement the Principle F2 (metadata) and the Interoperability Principles (I1, I2, I3 related to semantic artefacts) but queries also about the services used to generate, edit, publish, and transform them, altogether called FAIR Supporting Resources (FSRs). The survey is an ongoing effort and everybody can contribute to it via the SIP Wizard (https://sip-wizard.ds-wizard.org/). In summary, a SIP is a machine-interpretable collection of resources chosen by a community whereby the collection can be made specific for a data type and a semantic interoperability case study. 

FAIR Connect (https://fairconnect.pro/) is being developed to provide a user-friendly, graphics rich dashboard and search engine on nanopublications of type FSR. It will enable users to find FSRs based on its type or label and will inform at the same time by which communities it is used. In a future iteration it will also enable filters on data types and case studies.   

How to cite: Magagna, B., Suchánek, M., and Kuhn, T.: Semantic Interoperability Profiles as knowledge base for semantic solutions, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12230, https://doi.org/10.5194/egusphere-egu24-12230, 2024.

EGU24-12844 | Orals | ESSI3.5

EPOS Seismology: Connecting Communities, Advancing Research, and Paving the Way Forward 

Margarita Segou, Kiratzi Anastasia, Carlo Cauzzi, Susana Custódio, Rémy Bossu, Florian Haslinger, Laurentiu Danciu, Fatemeh Jalayer, Roberto Basili, Irene Molinari, and Adrien Oth

We present the dynamic landscape of EPOS Seismology, a Thematic Core Service consortium at the foundation of the European Plate Observing System (EPOS) infrastructure. Cultivated over the past decade through partnerships with prominent pan-European seismological entities,  the ORFEUS (Observatories and Research Facilities for European Seismology), EMSC (Euro-Mediterranean Seismological Center), and EFEHR (European Facilities for Earthquake Hazard and Risk), EPOS Seismology stands out as a collaborative governance framework. Facilitating the harmonized interaction between seismological community services, EPOS, and its associated bodies, endeavors to widen the collaboration to include data management, product provision, and the evolution of new seismological services.

Within the EPOS Delivery Framework, EPOS Seismology pioneers a diverse array of services, fostering open access to a wealth of seismological data and products while unwaveringly adhering to the FAIR principles and promoting open data and science. These services encompass the archival and dissemination of seismic waveforms of  24,000 seismic stations, access to pertinent station and data quality information, parametric earthquake data spanning recent and historical events, as well as advanced event-specific products such as moment tensors and source models together with reference seismic hazard and risk for the Euro-Mediterranean region. 

The seismological services are seamlessly integrated into the interoperable centralized EPOS data infrastructure and are openly accessible through established domain-specific platforms and websites. Collaboratively orchestrated by EPOS Seismology and its participating organizations, this integration provides a cohesive framework for the ongoing and future development of these services within the extensive EPOS network. The products and services support the transformative role of seismological research infrastructures, showcasing their pivotal contributions to the evolving narrative of solid Earth science within the broader context of EPOS.

How to cite: Segou, M., Anastasia, K., Cauzzi, C., Custódio, S., Bossu, R., Haslinger, F., Danciu, L., Jalayer, F., Basili, R., Molinari, I., and Oth, A.: EPOS Seismology: Connecting Communities, Advancing Research, and Paving the Way Forward, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12844, https://doi.org/10.5194/egusphere-egu24-12844, 2024.

EGU24-12891 | Posters on site | ESSI3.5

Multidisciplinary analysis of near fault observatory data: example from the Alto Tiberina fault (Northern Apennines, Italy) 

Enrico Serpelloni, Lucia Zaccarelli, Licia Faenza, Antonio Caracausi, Carlos Almagro Vidal, Francesco Pintori, Eugenio Mandler, and Lauro Chiaraluce

Earthquakes, intricate natural events spanning multiple spatio-temporal scales, necessitate a comprehensive understanding of the physical and chemical processes driving a broad spectrum of fault slip modes. To achieve this, the acquisition of multidisciplinary and dense datasets is imperative. Near Fault Observatories (NFOs) play a pivotal role by offering spatially and temporally dense, high-precision near-fault data, fostering the generation of novel observations and innovative scientific insights. However, the integration and interpretation of diverse datasets from various disciplines (geophysics, geochemistry, hydrology, etc.) present challenges. These datasets often consist of time-series depicting the temporal evolution of different parameters, sampling diverse temporal and spatial scales, depths, and the distinct or cumulative effects of various multiscale processes. In this presentation, we share outcomes from the INGV multidisciplinary project MUSE: M​ultiparametric and m​U​ltiscale ​S​tudy of ​Earthquake preparatory phase in the central and northern Apennines. Our emphasis lies in showcasing the approaches developed to analyze, integrate, and extract new knowledge from the EPOS Near Fault Observatory TABOO. This state-of-the-art observatory, managed by the Istituto Nazionale di Geofisica e Vulcanologia (INGV), boasts a dense network with an average inter-distance of approximately 5 km between multidisciplinary sensors. These sensors, deployed at the surface and within shallow boreholes, include seismometrical, geodetic, geochemical, hydrological, and strain stations. The project's core objective is to unravel the interconnections between different observables and explore the causal relationships among them. We will present the datasets, the methods employed, and discuss the significance of considering the interaction between fluid and solid geophysical processes in comprehending earthquake phenomena. Additionally, we will articulate the potential innovative scientific products that can arise from this research, contributing to a deeper understanding of earthquake processes.

How to cite: Serpelloni, E., Zaccarelli, L., Faenza, L., Caracausi, A., Almagro Vidal, C., Pintori, F., Mandler, E., and Chiaraluce, L.: Multidisciplinary analysis of near fault observatory data: example from the Alto Tiberina fault (Northern Apennines, Italy), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12891, https://doi.org/10.5194/egusphere-egu24-12891, 2024.

EGU24-12925 | ECS | Orals | ESSI3.5

SESAME: Software tools for integrating Human - Earth System data 

Abdullah Al Faisal, Maxwell Kaye, and Eric Galbraith

Human activities have extensively modified over 70% of Earth’s land surface and two-thirds of marine environments through practices such as agriculture, industrialization, and urbanization. These activities have resulted in a wide range of environmental problems, including biodiversity loss, water pollution, soil erosion, and climate change. However, human data is often available only in tabular form, is difficult to integrate with natural Earth variables, and can pose significant challenges when trying to understand the complex integration between human activities and natural Earth systems. On the other hand, scientific datasets, which are spread across websites, come in different formats, may require preprocessing, use different map projections, spatial resolution, and non-standard units, are difficult for both beginner and experienced researchers to access and use due to their heterogeneity. This discrepancy hinders our understanding of complex interactions between human activities and the environment.

To bridge this gap, we have created the Surface Earth System Analysis and Modelling Environment (SESAME) software and dataset package, which aims to solve the problem of fragmented and difficult-to-use human-Earth data. It can handle various data formats and generate a standardized gridded dataset with minimal output. SESAME is a software infrastructure that automatically transforms five input data types (raster, point, line, polygon, and tabular) into standardized desired spatial grids and stores them in a netCDF file. The ability of a netCDF file to store multidimensional timeseries data makes it an ideal platform for storing complex global datasets. SESAME utilizes the dasymmetric mapping technique to transform jurisdiction-level tabular data into a gridded layer proportional to the corresponding surrogate variable while considering changes in country boundaries over time. It maintains the consistency between input and output data by calculating the global sum and mean.

By converting human tabular data into a gridded format, we can facilitate comprehensive and spatially explicit analyses, advancing our understanding of human-Earth systems and their complex interactions. These gridded datasets are intended to be used as inputs to a range of different Earth system models, potentially improving the simulation and evaluation of scenarios and leading to more informed and strategic future policy decisions.

How to cite: Faisal, A. A., Kaye, M., and Galbraith, E.: SESAME: Software tools for integrating Human - Earth System data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12925, https://doi.org/10.5194/egusphere-egu24-12925, 2024.

EGU24-12962 | Posters on site | ESSI3.5

The Marble climate informatics platform: data discovery and data access 

Deepak Chandan, Misha Schwartz, and Steve Easterbrook

Advances in remote sensing and computing infrastructure, and demands of modern climate research are driving the production of new climate datasets at a breathtaking pace. It is increasingly felt by researchers, that the growing volume of climate datasets is challenging to store, analyze or generally "shepherd" through their analysis pipelines. Quite often, the ability to do this is limited to those with access to government or institutional facilities in wealthier nations, raising important questions around equitable access to climate data.

The Data Analytics for Canadian Climate Services (DACCS) project has built a cloud based network of federated nodes, called Marble, that allows anyone seeking to extract insights from the large volumes of climate data to undertake their study without concerning themselves with the logistics of acquiring, cleaning and storing data. The aspiration for building this network is to provide a low-barrier entry not only to those working in core climate change research, but also to those involved in climate mitigation, resilience and adaptation work and to policy makers, non-profits, educators and students. Marble is one of the platforms selected to contribute to the 'Open Science Platform' component of the OGC’s OSPD initiative.

The user-facing aspect of the platform is comprised of two components: (i) the Jupyter compute environment and (ii) the data server and catalogue. Here, we focus on the latter and present details of the infrastructure, developed on top of proven open-source software and standards (e.g. STAC), that allows for discovery and access of climate datasets stored anywhere on the network by anyone on the network. We will also discuss the publication capability of the platform that allows a user to host their own data on the network and make it quickly available to others.

How to cite: Chandan, D., Schwartz, M., and Easterbrook, S.: The Marble climate informatics platform: data discovery and data access, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12962, https://doi.org/10.5194/egusphere-egu24-12962, 2024.

EGU24-13367 | Orals | ESSI3.5

EPOS-GNSS – Operational Advancements in EPOS GNSS Data and Product Services 

Rui Fernandes, Carine Bruyninx, Luis Carvalho, Paul Crocker, Gael Janex, Juliette Legrand, Jean-Luc Menut, Anne Socquet, and Mathilde Vergnolle and the EPOS-GNSS Contributors

As the European Plate Observing System (EPOS) transitions into its Operational Phase, the EPOS-GNSS Thematic Core Service continues to play a pivotal role in managing and disseminating Global Navigation Satellite Systems (GNSS) data and products across Europe. As EPOS-GNSS advances into its operational stage, the commitment to organizational effectiveness and technical innovation has been reinforced. This ensures that EPOS-GNSS continues to provide valuable services and products tailored for Solid Earth research applications.

In this presentation, we highlight key developments achieved during the pre-operational phase and the ongoing operational status where evolution continues to be a central component for the EPOS-GNSS community. The four critical pillars of EPOS-GNSS are discussed: (a) Governance – we have Intensified efforts to ensure the representation and recognition of the entire community as well as deepening collaboration with data providers, end-users, and pan-European infrastructures, notably EUREF; (b) Metadata and Data – the dissemination of quality controlled GNSS data and associated metadata has been integrated into the operational framework; (c) Products – internally consistent GNSS solutions of dedicated products (time-series, velocities, and strain-rates) using state-of-art methodologies; and (d) Software – GLASS, the dedicated software package that facilitates the dissemination of GNSS data and products using FAIR principles while maintaining rigorous quality control procedures through four different GNSS dedicated web portals and the EPOS Integrated Core Services Data Portal.

Finally, we also present some examples of the usage of the EPOS-GNSS Data Products in multi-, inter-, and trans-disciplinaries studies where we exhibit the importance of the geodetic information for Solid Earth studies particularly in an integrated environment as promoted by EPOS.

How to cite: Fernandes, R., Bruyninx, C., Carvalho, L., Crocker, P., Janex, G., Legrand, J., Menut, J.-L., Socquet, A., and Vergnolle, M. and the EPOS-GNSS Contributors: EPOS-GNSS – Operational Advancements in EPOS GNSS Data and Product Services, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13367, https://doi.org/10.5194/egusphere-egu24-13367, 2024.

EGU24-13654 | ECS | Posters on site | ESSI3.5

Federated Climate Research Software: improving data and workflow management for climate researchers 

Misha Schwartz, Deepak Chandan, and Steve Easterbrook

Climate researchers have access to astronomical amounts of data; but finding that data and downloading it so that it can be useful for research can be burdensome and expensive.

The team at Data Analytics for Canadian Climate Services (DACCS) is solving that problem by creating a new system for conducting climate research and providing the software to support it. The system works by providing researchers the tools to analyze the data where it’s hosted, eliminating the need to download the data at all.

In order to accomplish this, the DACCS team has developed a software stack that includes the following services:

- data hosting
- data serving (using OPeNDAP protocols)
- data search and cataloging
- interactive computational environments preloaded with climate analysis tools
- remote analysis tools (WPS and OGCAPI features)

Partner organizations can deploy this software stack and choose to host any data that they wish. This data then becomes available to every other participating organization, allowing them seamless access each others data without having to move it for analysis.

This system will allow researchers to more easily:

- discover available data hosted all over the world
- develop analysis workflows that can be run anywhere
- share their work with collaborators without having to directly share data

The DACCS team is currently participating in the Open Science Persistent Demonstrator (OSPD) initiative and we hope that this software will contribute to the ecosystem of earth science software platforms available today.

How to cite: Schwartz, M., Chandan, D., and Easterbrook, S.: Federated Climate Research Software: improving data and workflow management for climate researchers, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13654, https://doi.org/10.5194/egusphere-egu24-13654, 2024.

EGU24-14052 | Orals | ESSI3.5

Who has got what where? FAIR-ly coordinating multiple levels of geophysical data products over distributed Research Infrastructures (RIs) to meet diverse computational needs and capabilities of users. 

Lesley Wyborn, Nigel Rees, Jo Croucher, Hannes Hollmann, Rebecca Farrington, Benjamin Evans, Stephan Thiel, Mark Duffett, and Tim Rawling

Modern research data processing pipelines/workflows can have quite complex lineages. Today, it is more than likely that a scientific workflow will rely on multiple Research Infrastructures (RIs), numerous funding agencies and geographically separate organisations to collect, produce, process, analyse and reanalyse primary and derivative datasets. Workflow components can include:

  • Shared instruments to acquire the data;
  • Separate research groups processing/calibrating field data and developing additional derived products;
  • Multiple repository infrastructures to steward, preserve and provide access to the primary data and resultant products sustainably and persistently; and
  • Different types of software and compute infrastructures that enable multiple ways to access and process the data and products, including in-situ access, distributed web services and simple file downloads.

In these complex workflows, individual research products can be generated through multiple levels of processing (L0-L4), as raw instrument data is collected by remote instruments (satellites, drones, airborne instruments, shared laboratory and field infrastructures) and is converted into more useful parameters and formats to meet multiple use cases. Each individual level of processing can be undertaken by different research groups using a variety of funding sources and RIs, whilst derivative products could be stored in different repositories around the globe.

An additional complexity is that the volumes and resolution of modern earth and environmental datasets is exponentially growing and many RIs can no longer store and process the volumes of primary data acquired. Specialised hybrid HPC/Cloud infrastructures with co-located datasets that allow for virtual in situ high volume data access are emerging. But these petascale/exascale infrastructures are not required for all use cases, and traditional small volume file downloads of evolved data products and images for local processing are all that many users need. 

At the core of many of these complex workflows are the primary, often high resolution observational dataset that can be in the order of terabytes and petabytes. Hence for transparent Open Science and to enable attribution to funders, collectors and repositories that preserve these valuable data assets, all levels of all derivative data products need to be able to trace their provenance back to these source datasets.

Using examples from the recently completed 2030 Geophysics Data Collection project in Australia (co-funded by AuScope, NCI and ARDC), this paper will show how original primary field acquired datasets and their derivative products can be accessible from multiple distributed RIs and government websites. They are connected using the FAIR principles and ensure that at a minimum, lineage and prehistory is recorded in provenance statements and linked using metadata elements such as ‘isDerivedFrom’ and DOIs. Judicious use of identifiers such as ORCIDs, RORs and DOIs links data at each level of processing with the relevant researchers, research infrastructure, funders, software developers, software etc. Integrating HPC centers that are colocated with large volume high resolution data infrastructures within complex and configurable research workflows is providing a key input to supporting next-generation earth and environmental research and enabling new and exciting scientific discoveries. 

How to cite: Wyborn, L., Rees, N., Croucher, J., Hollmann, H., Farrington, R., Evans, B., Thiel, S., Duffett, M., and Rawling, T.: Who has got what where? FAIR-ly coordinating multiple levels of geophysical data products over distributed Research Infrastructures (RIs) to meet diverse computational needs and capabilities of users., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14052, https://doi.org/10.5194/egusphere-egu24-14052, 2024.

EGU24-14521 | Orals | ESSI3.5

Towards a Digital Twin for the Alps to simulate water-related processes and geohazards for climate change adaptation strategies. 

Jean-Philippe Malet, Maxime Lamare, Lucia Guardamino, Jonas Viehweger, Stefania Camici, Luca Brocca, Silvia Barbetta, Bianca Bonaccorsi, Sara Modanesi, Angelica Tarpanelli, Matteo Dall’Amico, Federico Di Paolo, Nicolo Franceschetti, Clément Michoud, Thierry Oppikoffer, David Michéa, Floriane Provost, Aline Déprez, Michaelis Foumelis, and Philippe Bally

The Alps are the most densely populated mountain range in Europe and water resources play a central role in the socio-economic developments of the area (agriculture, tourism, hydropower production...). Furthermore, the Alps are particularly sensitive to the impacts of climate change and thus to hydro-meteorological hazards such as landslides, floods, droughts and glacier related processes, which are expected to increase in the near future, constitute a major threat to human activity. Indeed, over the last century, temperatures have risen twice as fast as the northern-hemisphere average, whereas precipitation has increased non-linearly and has become more discontinuous.

Because of the increasing pressure on human settlements and infrastructure, there is a strong priority for policy-makers to implement climate change adaptation strategies from the local to the regional scale. To support and improve the decision-making process, numerical decision support systems may provide valuable information derived from multi-parametric (in-situ sensors, satellite data) observations and models, linked to computing environments, in order to better manage increasing threats and weaknesses.

The main objective of the Digital Twin of Alps (eg. DTA) platform is to provide a roadmap for the implementation of future Digital Twin Components, with a focus on the Alpine chain. In this context, a demonstrator has been developed that enables a holistic representation of some of the major physical processes specific to the Alpine context, powered by a unique combination of Earth Observation data analytics, machine learning algorithms, and state-of-the-art hydrology and geohazard process-based models. Advanced visualization tools have been specifically implemented to favor easy exploration of the products for several categories of stakeholders.

The resulting Digital Twin Earth precursor will provide an advanced decision support system for actors involved in the observation and mitigation of natural hazards and environmental risks including their impacts in the Alps, as well as the management of water resources. For instance, through the demonstrator users can investigate the availability of water resources in terms of snow, soil moisture, river discharge and precipitation. Furthermore, it is possible to stress the system with scenario based options to see the impacts on the various hydrological drivers in terms of drought and flood probability. Finally, the user can assess flood hazard, forecast (with a daily leading time) the occurrence of shallow landslides (slope failure probability and material propagation) and predict the activity (e.g. velocity) of large deep-seated and continuously active landslides from extreme rain events through the use of a combination of physics- and AI-based simulation tools. Use cases in Northern Italy, South Swiss and South France are provided.

Finally, the user can visualise maps and time series of terrain motion products over several Alpine regions generated with advanced Earth Observation processing chains and services (GDM-OPT, Snapping) available on the Geohazards Exploitation Platform and the eo4alps-landslides App, providing a consistent description of Earth surface deformation (unstable slopes, large deep-seated landslides, ice glacier) for the period 2016-2022. The data, services and technologies used and developed for the platform will be presented.

How to cite: Malet, J.-P., Lamare, M., Guardamino, L., Viehweger, J., Camici, S., Brocca, L., Barbetta, S., Bonaccorsi, B., Modanesi, S., Tarpanelli, A., Dall’Amico, M., Di Paolo, F., Franceschetti, N., Michoud, C., Oppikoffer, T., Michéa, D., Provost, F., Déprez, A., Foumelis, M., and Bally, P.: Towards a Digital Twin for the Alps to simulate water-related processes and geohazards for climate change adaptation strategies., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14521, https://doi.org/10.5194/egusphere-egu24-14521, 2024.

EGU24-15585 | Orals | ESSI3.5

On the exploitation of the Sentinel-1 P-SBAS service within the EarthConsole® platform for unsupervised on-demand DInSAR processing  

Claudio De Luca, Massimo Orlandi, Manuela Bonano, Francesco Casu, Maddalena Iesuè, Michele Manunta, Giovanni Onorato, Mario Fernando Monterroso Tobar, Giancarlo Rivolta, and Riccardo Lanari

The current remote sensing scenario is nowadays characterized by an extensive exploitation of spaceborne Synthetic Aperture Radar (SAR) data to investigate the Earth surface dynamics. Such a request is rather well satisfied by the huge archives collected in the last ten years by the Copernicus Sentinel-1 (S1) SAR mission, which is distinguished by a “free and open” access data policy and a nearly global coverage acquisition strategy. In this regard, the most used space-borne geodetic technique for the investigation of the ground deformation is Differential Synthetic Aperture Radar Interferometry (DInSAR) that has largely demonstrated its effectiveness in measuring surface displacements in different scenarios. In particular, the advanced DInSAR method referred to as Parallel Small BAseline Subset (P-SBAS) approach has emerged as particularly effective to examine the temporal evolution of the detected surface displacements both in natural and anthropogenic hazard contexts, such as volcanoes, earthquakes, landslides and human-induced deformation due to mining activities, fluids exploitation, and large infrastructures construction.

In this context, the availability to the scientific community of algorithms and tools suitable to effectively exploit such huge SAR data archives, for generating value added products, is becoming crucial. To this aim, the P-SBAS algorithm has been released as an on-demand web-based tool by integrating it within the EarthConsole® platform, and currently contributes to the on-demand remote sensing component of the EPOSAR service. More in detail, EarthConsole® is a cloud-based platform supporting the scientific community with the development, testing, and hosting of their processing applications to enable Earth Observation (EO) data exploitation and processing services. EPOSAR, instead, is a service available in the framework of the European Plate Observing System (EPOS) Satellite community, which provides systematic ground displacement products relevant to various areas on Earth.

In this work we present the deployment of the P-SBAS tool within the EarthConsole® platform, in order to extend the EPOSAR service portfolio to the on-demand generation of DInSAR displacement maps and time series exploiting C-band satellite data. In particular, the developed service builds up on the already available capability to carry out a multi-temporal DInSAR processing of ENVISAT data and allow the scientific users to process also Sentinel-1 SAR images in a fully autonomous manner, through a user-friendly web graphical interface which permits to them to follow the progress of the processing tasks and to avoid the need of the SAR data download on their own processing and archiving facilities. The availability for the EPOS community of such an on-demand P-SBAS-based DInSAR processing service, which allows the scientific users to retrieve in an unsupervised way and in very short time ground displacement maps and time series relevant to large areas, may open new intriguing and unexpected perspectives to the comprehension of the Earth surface deformation dynamics.

How to cite: De Luca, C., Orlandi, M., Bonano, M., Casu, F., Iesuè, M., Manunta, M., Onorato, G., Monterroso Tobar, M. F., Rivolta, G., and Lanari, R.: On the exploitation of the Sentinel-1 P-SBAS service within the EarthConsole® platform for unsupervised on-demand DInSAR processing , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15585, https://doi.org/10.5194/egusphere-egu24-15585, 2024.

EGU24-15898 | Posters on site | ESSI3.5

Open Polar: A Comprehensive Database for Advancing Arctic and Antarctic Research 

Tamer Abu-Alam, Katie A. Smart, Per Pippin Aspaas, Leif Longva, Noortje Haugstvedt, and Karl Magnus Nilsen

In the realm of environmental and climate science, addressing the multifaceted challenges our planet faces necessitates a comprehensive approach. Holistic solutions are crucially dependent on the integration and interoperability of data. The polar regions, especially the Arctic, are particularly vulnerable to climate changes, experiencing a rate of temperature increase that is four times faster than the global average [1]. Accelerated polar warming is frequently marked by sea ice loss, but also includes shrinking habitats for polar biospheres that in turn drastically affect Arctic peoples. Though enhanced at the poles, the effects of warming are wide-ranging across the oceans and continents of our planet, affecting weather patterns, ecosystems and human activities. Polar research is thus invaluable for researchers and policymakers and should be widely and freely available. However, In 2019 a significant findability gap was discovered for open access polar records, indicating the need for a cross-disciplinary research service to provide efficient and seamless access to open polar research [2].  

 The Open Polar database [3] was launched in cooperation between the University Library at UiT The Arctic University of Norway and the Norwegian Polar Institute in 2021. Open Polar promotes Findable and Accessible polar research, such that researchers, policymakers, and society have equal and unfettered access to polar region publications and data. Open Polar harvests metadata from over 4600 open access providers, filters for polar research using over 11000 keywords, and enriches the record result by defining geolocations and applying correct DOIs, before finally building the Open Polar database that is searchable by standard text or geolocation. Currently, the database includes nearly 2.5 million open access records, consisting of approximately 75% publications and 25% datasets. Nearly 2 years after its launch, Open Polar maintains a constant robust engagement, and we aim to improve our usage by incorporating new sources, reducing redundancies and considering integration with data archiving and open education services.  

 [1] Rantanen, M., Karpechko, A.Y., Lipponen, A. et al. (2022). The Arctic has warmed nearly four times faster than the globe since 1979. Commun Earth Environ 3, 168. https://doi.org/10.1038/s43247-022-00498-3 

 [2] Abu-Alam, T. S. (2019). Open Arctic Research Index: Final report and recommendations. Septentrio Reports, (3). https://doi.org/10.7557/7.4682 

 [3] https://openpolar.no/ 

How to cite: Abu-Alam, T., Smart, K. A., Aspaas, P. P., Longva, L., Haugstvedt, N., and Nilsen, K. M.: Open Polar: A Comprehensive Database for Advancing Arctic and Antarctic Research, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15898, https://doi.org/10.5194/egusphere-egu24-15898, 2024.

EGU24-15957 | ECS | Orals | ESSI3.5

A Swedish National Infrastructure for Interdisciplinary Environmental Research Integrating Archaeological and Quaternary Geological Data 

Ershad Gholamrezaie, Philip Buckland, Roger Mähler, Johan von Boer, Rebecka Weegar, Mattias Sjölander, and Carl-Erik Engqvist

The Newly formed Swedish National Infrastructure for Digital Archaeology (SweDigArch) and the Strategic Environmental Archaeology Database (SEAD) are positioned at the intersection of environmental research, data science and humanities. They represent a considerable upscaling of archaeological and Quaternary geological databases, combining meticulous data management, collaborative stewardship, advanced online interfaces, and visualization.

SweDigArch seeks to enhance the open accessibility of archaeological data from Swedish institutions, unlocking the knowledge embedded in cultural heritage and environmental repositories to facilitate interdisciplinary and international research. At its core, SweDigArch aims to enable data-driven analyses across diverse archaeological, palaeoecological, and related materials, including links to biodiversity and other external data sources. This initiative advances research on the intricate relationships between human societies and their environments over long timescales, empowering scholars to formulate inquiries that contribute not only to historical comprehension but also hold contemporary relevance and prospective implications.

In the pursuit of data-driven analyses, SweDigArch focuses on facilitating research which examines past human-environment interactions. Through the analysis of archaeological and recent geological datasets, the project endeavors to stimulate research providing insights into the functioning of socio-ecological systems, identifying historical vulnerabilities and resilience-building factors. This knowledge, in turn, will inform contemporary design, planning, and policy frameworks across various institutional and infrastructural domains, from environmental and cultural impact assessments to assessing risks from future climate change.

SweDigArch aims to optimize the utility of Swedish archaeological and palaeoecological data through linked data, open formats, shared vocabularies, and the semantic web. This approach enriches national and international research initiatives and facilitates cross-cultural comparative research, contributing to a broader understanding of global human history.

Integral to the collaborative framework is SEAD, an Open Access repository for proxy environmental data, including various archaeological and palaeoecological datasets. Incorporating datasets such as BugsCEP fossil insect data and Swedish data on plant macrofossils, pollen, dendrochronology, geochemistry, and ceramic thin sections, SEAD's evolving functionality now extends to accommodate osteological and isotope analyses, underscoring its role as a dynamic platform for data visualization and semantic networking.

Together, SweDigArch and SEAD aim to bridge the divide between academic and contract archaeology, offering a pivotal resource for cultural and environmental historical research, urban planning, and sustainability analyses. These initiatives aspire to become the standard primary data infrastructure for all users of Swedish archaeological information, transcending scholarly circles to encompass fields such as cultural heritage preservation and urban planning. This collaborative endeavor invites active engagement from a diverse user base, fostering a scholarly ethos of openness, data-driven inquiry, and interdisciplinary collaboration to deepen our comprehension of the past and contribute to the sustainable shaping of the future.

This presentation will describe the infrastructure and provide examples of its use in the analysis and visualization of interdisciplinary data, including fossil insects, past climate change and human impact on biodiversity and the environment.

How to cite: Gholamrezaie, E., Buckland, P., Mähler, R., von Boer, J., Weegar, R., Sjölander, M., and Engqvist, C.-E.: A Swedish National Infrastructure for Interdisciplinary Environmental Research Integrating Archaeological and Quaternary Geological Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15957, https://doi.org/10.5194/egusphere-egu24-15957, 2024.

The Horizon Europe interTwin project is developing a highly generic yet powerful Digital Twin Engine (DTE) to support interdisciplinary Digital Twins (DT). Comprising thirty-one high-profile scientific partner institutions, the project brings together infrastructure providers, technology providers, and DT use cases from Climate Research and Environmental Monitoring, High Energy and AstroParticle Physics, and Radio Astronomy. This group of experts enables the co-design of the DTE Blueprint Architecture and the prototype platform; benefiting end users like scientists and policymakers but also DT developers. It achieves this by significantly simplifying the process of creating and managing complex Digital Twins workflows.

In the context of the project, among others, Digital Twin (DT) applications for extreme events (such as tropical cyclones and wildfires) on climate projections are being implemented. Understanding how climate change affects extreme events is crucial since such events can have a significant impact on ecosystems, and cause economic losses and casualties. In particular, the DT applications are based on Machine Learning (ML) approaches for the detection and prediction of the events exploiting climate/environmental variables. The interTwin DTE is aimed at providing the software and computing infrastructure for handling these complex applications in terms of AI model, data processing and workflow management.

 

The contribution will cover the use cases concerning extreme weather events, supported by project partner CMCC. 

interTwin is funded by the European Union (Horizon Europe) under grant agreement No 101058386.

How to cite: Franck, G. and Elia, D.: The interTwin DTE: supporting the development of extreme weather events applications, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16311, https://doi.org/10.5194/egusphere-egu24-16311, 2024.

EGU24-16403 | Posters on site | ESSI3.5

Implementation of a hydrogeochemical monitoring network following a multi-risk vision: the Strait of Messina (Italy) case. 

Marianna Cangemi, Carlo Doglioni, Paolo Madonia, Mario Mattia, and Giulio Selvaggi

The Strait of Messina, separating Sicily from continental Italy, is an area prone to different, high-grade, geological hazards. Here, many of the most devastating earthquakes of Italy have occurred, including the M 7.1 Messina-Reggio Calabria earthquake of 28 December 1908, the most intense event recorded in southern Europe in the instrumental epoch. The strait, on both sides, is surmounted by a mountain chain, directly degrading on a narrow, densely urbanized, coastal belt. Its steep slopes, composed of geological terrains with poor geotechnical characteristics, are affected by diffuse mass movements, as the 1 October 2009 landslide, triggered by an intense rainfall, which destroyed several little villages immediately southward of Messina, causing 37 causalities. The Peloro Cape area, the north-eastern termination of Sicily, hosts a lacunar environmental system, protected by the Ramsar Convention but also of economic interest, because exploited for shellfish livestock; these lagoons are extremely sensible to changes in sea level and temperature, which can pose serious threats to its ecological stability. This complex scenario exhibits a further criticality: the planned bridge for linking Sicily and continental Italy that, if realized, will be the longest single span bridge of the world.

This complex natural-built environment needs a multidisciplinary monitoring network for mitigating the multiple risks that affect both its natural and anthropic components. Its implementation is the aim of the Work Package 5 “NEMESI” of the Italian PNRR project MEET, the post-Covid 19 pandemic national plan for recovery and resilience, financed in the framework of the European Next Generation EU initiative.

Part of this multidisciplinary monitoring system will consist of a hydrogeochemical network, composed of 11 stations measuring, acquiring in a local logger and transmitting to the INGV data centre, data of temperature, level, electric conductivity, turbidity and dissolved O2 and CO2.

The main challenge in the implementation of the Strait of Messina hydrogeochemical network is the correct selection of the monitoring sites, which will be based in underground and surface water bodies, whose physic-chemical characteristics should contemporary work as indicators of very different processes: changes in electrical conductivity due to sea level rise, variations of temperature and piezometric levels induced by permeability changes driven by seismic and aseismic deformations, changes in oxygenation, turbidity and dissolved CO2, which can be controlled by both eutrophication and mixing with deep volatiles, whose flux is driven by neotectonic activity.

For accomplishing this mission, and producing open access data of interest for the different stakeholders, spanning from the scientific community to the shellfish food industry, it will be mandatory a real multidisciplinary approach, embracing geological, geophysical, geodetic, geochemical, eco-hydrological and socio-economic data.

 

How to cite: Cangemi, M., Doglioni, C., Madonia, P., Mattia, M., and Selvaggi, G.: Implementation of a hydrogeochemical monitoring network following a multi-risk vision: the Strait of Messina (Italy) case., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16403, https://doi.org/10.5194/egusphere-egu24-16403, 2024.

In-situ Earth observation data play a key role in environmental and climate related domains. However, in-situ data is often missing or hardly accessible for users due to technical barriers, for example, unstructured metadata information, missing provenance, lack of links to standard vocabularies or units of measure definitions. This communication presents a well-defined, formalized methodology for identifying and documenting requirements for in-situ data from a user’s point of view initially tested within the Group on Earth Observations. This is materialized into a comprehensive Geospatial In-situ Requirements Database and a related tool called G-reqs.

The G-reqs facilitates the requirements gathering process via a web-form that acts as the user interface. It compasses a variety of Needs: Calibration/Validation of remote sensing products, Calibration/Validation of other in-situ data, input assessment for a numerical modeling, creation of an Essential Variable product, etc. Depending on the type of need, there will be requirements for in-situ data that can be formally expressed in the main components of the geospatial information: spatial, thematic, and temporal (e.g. area of scope, variable needed, thematic uncertainty, positional accuracy, temporal coverage and frequency, representative radius, coordinate measurements, etc). The G-reqs is the first in-situ data requirements repository at the service of the evolution of the GEO Work Programme but it is not limited to them. In fact, the entire Earth observation community of users is invited to provide entries to G-reqs. The requirements collected are technology-agnostic and neither takes into account the specific characteristics of any dedicated instrument nor sensors acquiring the data. The web-form based tool and the list of all validated requirements are FAIRly accessible in the G-reqs web site at https://www.g-reqs.grumets.cat/.

After a process of requirements gathering, the presented approach is aiming to discover where similar requirements across different scientific domains are shared, fostering in-situ data reusability, and guiding the priorities for the creation of new datasets by key in-situ data providers. For example, in-situ networks of observation facilities (ENVRI, e.g. ELTER, GEOBON, among others) are invited to direct their users to provide requirements to the G-reqs and participate in the analysis of the requirements, detect gaps in current data collection and formulate recommendations for the creation of new products or refine existing ones. The final aim is to improve the interoperability and accessibility of actionable in-situ Earth observation data and services, and its reuse.

This work is inspired by the OSAAP (formerly NOSA) from NOAA, the WMO/OSCAR requirements database and the Copernicus In-Situ Component Information System (CIS2) and developed under the InCASE project, funded by the European Environment Agency (EEA) in contribution to GEO and EuroGEO.

How to cite: Brobia, A., Maso, J., Serral, I., and Voidrot, M.-F.: G-reqs as a framework for defining precise, technology-agnostic, user-driven geospatial in-situ requirements. Towards a FAIR Global Earth Observation System of Systems without data gaps., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16756, https://doi.org/10.5194/egusphere-egu24-16756, 2024.

EGU24-17229 | Orals | ESSI3.5

Active structures and thermal state of the Piton de la Fournaise summit revealed by multi-methods high resolution imaging 

Lydie Gailler, Philippe Labazuy, Romain Guillard, Solène Buvat, Clément Grace, Erwan Thébault, and Edouard Régis and the ERT3D Scan4Volc

Our understanding of dynamic volcanic processes (fluid transfers at depth and eruptions, collapses and sliding, etc.) relies directly on our knowledge of the geometries of magmatic and hydrothermal systems, mechanical heterogeneities and how these structures evolve in time. Imaging the internal structure and temporal dynamics of volcanoes still represents a real challenge to univocally identify the processes that govern their evolution, including eruptive precursors, instabilities phenomena, surface manifestations and their repercussions. It is therefore necessary to more rigorously constrain the geometry and the spatio-temporal dynamics of these structures, and their activation at different depths.

The behaviour of these structural volcanic features strongly depends on physical parameters such as temperature and fluid composition that can be assessed using a range of complementary ground and remote observations. Among these, geophysical methods provide images of the internal structure, which can subsequently be translated in terms of geological structure and evolution. Such constraints are also necessary to provide more realistic numerical models. Recent improvements to the available suite of the instrumentation for volcanological studies, including field geophysics (ground and airborne-Unmanned Aerial Vehicles, UAVs), remote sensing methods and numerical capabilities, allows us to build even more comprehensive analyses of such terrestrial phenomena. In addition, combining several spatial (local and more regional) and temporal scales (one-off studies, time lapse through reiterations, time series) help to better follow the dynamics of the edifices, anticipate eruptive crises and associated hazards.

Here we focus on the highly active and well monitored Piton de la Fournaise laboratory volcano, which is an excellent case study to develop and apply new methodologies in order to address both scientific and societal issues. Amongst the most significant parameters, recent studies have evidenced the potential of magnetic field measurements in imaging thermal anomalies (strong influence of temperature on magnetic measurements) and mechanical heterogeneities (fracturing-alteration at depth). Electrical resistivity is also a powerful tool in volcanic contexts, being very sensitive to fluid contents and particularly well suited to image the shallow structure of a volcanic edifice through, for example, innovative 3D surveys, or more in-depth using magnetotellurics measurements. Based on the analysis of combined recent reiterations of ground magnetic measurements, UAV magnetic and thermal infrared acquisitions, as well as high resolution electrical resistivity measurements, we focus on the 3D structure and recent evolution of the summit activity at Piton de la Fournaise, using additional constraints such as seismicity and deformation (InSAR inverse modelling).

This study confirms that detecting resistivity and magnetization anomalies, and quantifying their spatiotemporal evolution, can provide powerful tools for imaging volcanic systems at various scales and for providing warning of associated hazards. It also highlights the necessity for 4D monitoring of volcanic edifices using this method to provide greater precision, an important issue that is now made possible using UAV and near real time analyses.

These observational datasets aim to be integrated in open databases distributed through French and European research structures and infrastructures, namely the National Volcanology Observation Service (CNRS-INSU), Epos-France and Data Terra Research Infrastructures, as well as the EPOS VOLC-TCS.

How to cite: Gailler, L., Labazuy, P., Guillard, R., Buvat, S., Grace, C., Thébault, E., and Régis, E. and the ERT3D Scan4Volc: Active structures and thermal state of the Piton de la Fournaise summit revealed by multi-methods high resolution imaging, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17229, https://doi.org/10.5194/egusphere-egu24-17229, 2024.

EGU24-17596 | Orals | ESSI3.5

Challenges and opportunities from an in-house cross collaboration between three research infrastructure data repositories 

Claudio D'Onofrio, Ute Karstens, Alex Vermeulen, Oleg Mirzov, and Zois Zogopoulos

The ICOS Carbon Portal is the main data repository for the Integrated Carbon Observation System Research Infrastructure (ICOS RI), covering the domains Atmosphere, Ocean, and Ecosystems. Data from ICOS is available and accessible for humans and machines with a rich set of metadata under a CC BY 4.0 licence. The core services for the data portal (https://data.icos-cp.eu/portal/) are open-source software and are available on GitHub (https://github.com/ICOS-Carbon-Portal). The main goal for the development was to make the European greenhouse gas measurements accessible as FAIR as possible. This led to a mature and stable data portal which was subsequently adapted to be applied by another Research Infrastructure namely SITES, a national Swedish Infrastructure for Ecosystem Science, and the European Horizon 2020 project PAUL, pilot applications in urban landscapes (ICOS Cities). Although all three data portals use the same software core and are hosted at the ICOS Carbon Portal, they are independent from each other and base on slightly different ontologies. Hence, we have a unique opportunity to explore the challenges and opportunities of accessing and combining data from three or more different data sources and compare FAIR aspects of the datasets. How do we deal with attribution of the used data using correct citations? Do we have access to the licence for each data sets, are they different and what are the implications? How do we combine the data for further analysis keeping track of provenance and origin?

Further we will try to step back from the implementation of a service on specific data sets (which is kind of a hands-on bottom-up approach) and look at scalability to include other (environmental/ENVRI) data portals and think more about the top-down approach like the European Open Science Cloud EOSC. Can we offer a generalised service level for automated data processing from machine to machine? What do we need to process cross domain data sets?

How to cite: D'Onofrio, C., Karstens, U., Vermeulen, A., Mirzov, O., and Zogopoulos, Z.: Challenges and opportunities from an in-house cross collaboration between three research infrastructure data repositories, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17596, https://doi.org/10.5194/egusphere-egu24-17596, 2024.

EGU24-17978 | Posters on site | ESSI3.5

Notebook-as-a-VRE (NaaVRE): From private notebooks to a collaborative cloud virtual research environment 

Gabriel Pelouze, Spiros Koulouzis, and Zhiming Zhao

Studying many scientific problems, such as environmental challenges or cancer diagnosis, requires extensive data, advanced models, and distributed computing resources. Researchers often reuse assets (e.g. data, AI models, workflows, and services) from different parties to tackle these issues. This requires effective collaborative environments that enable advanced data science research: discovery access, interoperation and reuse of research assets, and integration of all resources into cohesive observational, experimental, and simulation investigations with replicable workflows. Such use cases can be effectively supported by Virtual Research Environments (VREs). Existing VRE solutions are often built with preconfigured data sources, software tools, and functional components for managing research activities. While such integrated solutions can effectively serve a specific scientific community, they often lack flexibility and require significant time investment to use external assets, build new tools, or integrate with other services. In contrast, many researchers and data scientists are familiar with notebook environments such as Jupyter. 

We propose a VRE solution for Jupyter to bridge this gap: Notebook-as-a-VRE (NaaVRE). At its core, NaaVRE allows users to build functional blocks by containerizing cells of notebooks, composing them into workflows, and managing the lifecycle of experiments and resulting data. The functional blocks, workflows, and resulting datasets can be shared to a common marketplace, enabling the creation of communities of users and customized VREs. Furthermore, NaaVRE integrates with external sources, allowing users to search, select, and reuse assets such as data, software, and algorithms. Finally, NaaVRE natively works with modern cloud technologies, making it possible to use compute resources flexibly and cost-effectively.

We demonstrate the versatility of NaaVRE by building several customized VREs that support legacy scientific workflows from different communities. This includes the derivation of ecosystem structure from Light Detection and Ranging (LiDAR) data, the tracking of bird migrations from radar observations, and the characterization of phytoplankton species. The NaaVRE is also being used to build Digital Twins of ecosystems in the Dutch NWO LTER-LIFE project.

How to cite: Pelouze, G., Koulouzis, S., and Zhao, Z.: Notebook-as-a-VRE (NaaVRE): From private notebooks to a collaborative cloud virtual research environment, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17978, https://doi.org/10.5194/egusphere-egu24-17978, 2024.

EGU24-18535 | Orals | ESSI3.5

Using IGSN IDs in Geosciences Sample Management with RSpace: Use Case & Workflows 

Vaida Plankytė, Rory Macneil, Rorie Edmunds, and Noortje Haugstvedt

Overview

The International Generic Sample Number (IGSN ID), functionally a DataCite DOI, enables material samples from any discipline to be identified with a globally unique and persistent ID. 

This scalable FAIRification of samples enables transparent and traceable connections between a sample and other research entities, including (sub)samples, collections, instruments, grants, data, publications, people, and organizations. In 2023, support for the registration, metadata input, and publication of IGSN IDs was incorporated into the RSpace sample management system.

After introducing IGSN IDs, we overview the use case developed in collaboration with UiT The Arctic University of Norway regarding research workflows involved in geosciences field studies, and the corresponding IGSN ID and sample management functionality required to support these research workflows.

We then present our incorporation of IGSN IDs into RSpace as part of an institutional deployment solution for FAIR samples, detailing features and their various design considerations based on researcher needs.

Geosciences Use Case – UiT The Arctic University of Norway

A research group within the Department of Geosciences plans to assign IGSN IDs to samples collected during their 2024 field campaign in a remote Arctic area. The group needs to record basic structured sample information offline, while in the field. The institutional research data managers wish to increase sample visibility within the greater research community, ensure metadata format standardization, and facilitate metadata management by using an ELN with IGSN ID capabilities.

An offline field data collection tool, FieldMark, can be used to design powerful templates for metadata capture, and links IGSN IDs scanned from physical labels with rich metadata, including geolocation capture. Once back from the field, the sample metadata and templates, and their associated IGSN IDs, can be imported into RSpace, preserving format.

What is more, by assigning IGSN IDs to samples as well as features-of-interest, using instrument PIDs, and linking related entities, researchers model a rich PID graph that accurately portrays these relationships.

The samples are then utilized in active research: metadata editing as well as underlying template editing, linking experimental records and materials with samples, and inclusion of optional metadata fields are supported in RSpace.

Finally, the samples can be published alongside other materials, with RSpace generating a public metadata landing page for each sample containing both IGSN ID and domain-specific metadata. The IGSN ID metadata also becomes findable in DataCite’s records and API.

RSpace IGSN ID Features

We present the IGSN ID implementation in RSpace, including recent functionality:

  • Assigning ROR (Research Organization Registry) IDs to an RSpace instance, automatically populating IGSN metadata with affiliation information
  • Geolocation support for dynamic point, box, and polygon map previews alongside the coordinates on the public landing page
  • Ability to display domain-specific sample fields on the landing page to enable comprehensive metadata sharing

As well as upcoming work:

  • Integrating with other DataCite Service Providers to facilitate deposit of sample metadata into domain-specific repositories, analogous to ELN document export to repositories
  • Facilitating the use of singleton samples alongside batches of subsamples, while retaining the system’s ease of navigation and conceptual clarity

How to cite: Plankytė, V., Macneil, R., Edmunds, R., and Haugstvedt, N.: Using IGSN IDs in Geosciences Sample Management with RSpace: Use Case & Workflows, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18535, https://doi.org/10.5194/egusphere-egu24-18535, 2024.

EGU24-19358 | ECS | Posters on site | ESSI3.5

Research Notebook Retrieval with Explainable Query Reformulation 

Na Li, Peide Zhu, Gabriel Gabriel Pelouze, Spiros Koulouzis, Zhiming Zhao, and Zhiming Zhao

Data science and Machine learning techniques are increasingly important in tackling societal challenges and complex problems in environmental and earth sciences. Effectively sharing and (re)using research assets, including data sets, models, software tools, documents, and computing notebooks, are crucial for such data-centric research activities. Researchers can reproduce others’ experiments with the available research assets to verify the results and to further develop new experiments. Computational notebooks, as an important asset, comprise free-form textual descriptions and code snippets. The notebook runtime environments, e.g., Jupyter, provide scientists with an interactive environment to construct and share the experiment descriptions and code as notebooks. 

 

To enable effective research assets discovery, research infrastructures not only need to FAIRify the assets with rich meta information and unique identifiers but also provide search functionality and tools to facilitate the construction of scientific workflows for the data sciences experiments with research assets from multiple sources. The general-purpose search engines are helpful for initial coarse-grained search but often fail to find multiple types of research assets such as the data sets and notebooks needed by the research. The community-specific catalogues, e.g., in ICOS and LifeWatch, provide search capabilities for precisely discovering data sets, but they are often characterized by a specific type of asset. A researcher has to spend lots of time searching across multiple catalogues to discover all types of assets needed. 

In the search process, user queries tend to be short and comprised of several key phrases that demand great efforts to understand users’ information needs. Given the complexity of computational notebook contents and the mismatch between the form of user queries and the computational notebooks, it is critical to understand queries by augmentations and make explainable relevance judgments. To address these challenges, we developed a research asset search system for a Jupyter notebook-based Virtual Research Environment (called Notebook as a VRE) that supports scientific query understanding with query reformulation and explainable computational notebook relevance judgments via computational notebook summarization. 

The proposed system includes four major components: the query reformulation module, the notebook indexer and retriever, the summarization component, and the user interface.  The query reformulation module performs active query understanding via query reformulation, where we extract scientific entities from user queries and search related entities from external knowledge graphs and resources as expansions and rank the reformulated queries for users to choose from. The system has been validated via a small user group and will be further developed in the coming ENVRI-HUB next project to support conversational search and recommendation. 

How to cite: Li, N., Zhu, P., Gabriel Pelouze, G., Koulouzis, S., Zhao, Z., and Zhao, Z.: Research Notebook Retrieval with Explainable Query Reformulation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19358, https://doi.org/10.5194/egusphere-egu24-19358, 2024.

EGU24-19379 | Posters on site | ESSI3.5

Status and Planning Update on Big Data Standardization in OGC and ISO 

Peter Baumann

Earth data are an archetypical case of Big Data, in all their Volume, Velocity, Variety, and Veracity challenges. Since long, therefore, standardization in ISO, OGC, and further bodies is concerned with developing and advancing specifications for structures and services suitable for Big Data. Questions to keep in mind include:
- How can data wrangling be simplified, for example through better suited concepts and elimination of unnecessary technicalities?
- How can specifications support scalable implementations?
- What is necessary to make data ready for analysis, more generally: for the various types of consumption?

Specifically, the commonly accepted concept of multi-dimensional coverages - corresponding to the notion of spatio-temporal "fields" in physics" - addresses Big Data, in practice: regular and irregular grids, point clouds, and general meshes. Recently, ISO has adopted two "abstract" standards with normative definitions of coverage concepts and terminology. 19123-1 addresses the coverage data model whereas 19123-3 is about coverage processing fundamentals, utilizing the OGC Web Coverage Processing Service (WCPS) model. OGC has adopted both as an update to its Abstract Topic 6.

On the level of "concrete" specifications directed towards implementation and conformance testing there is the joint OGC/ISO Coverage Impementation Schema. In OGC the current version is 1.1 which introduces the General Grid Coverage as a more powerful, yet simplified structure for regular and irregular grids. ISO has commenced work on updating its 19123-2, which is still based on OGC CIS 1.0), with  CIS 1.1. On the processing side, there are various activities in addition to the proven, mature WCS and WCPS, such as drafts for OAPI-Coverages and GeoDataCube.

We present the current status of ISO and OGC standardization work on coverages. The author is active as editor of adopted and in-progress standards in OGC and ISO since over 15 years, and intimately familiar with standardization work there. By sharing the status and plans of standardization the talk provides an opportunity for the community to comment on plans and share any comments and suggestions.

 

How to cite: Baumann, P.: Status and Planning Update on Big Data Standardization in OGC and ISO, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19379, https://doi.org/10.5194/egusphere-egu24-19379, 2024.

EGU24-19603 | Posters on site | ESSI3.5

ITINERIS HUB: the unified access point to Italian environmental facilities, FAIR data and related services 

Claudio Dema, Francesco Izzi, Lucia Mona, Vito Salvia, Carmela Cornacchia, Ermann Ripepi, Michele Volini, and Gelsomina Pappalardo

The Italian Integrated Environmental Research Infrastructures System (ITINERIS) Project coordinates a network of national nodes from 22 RIs.

ITINERIS has been designed looking at synergy with the European RI framework, and it will support the participation of Italian scientists in pan-European initiatives (like ENVRI-Hub NEXT and EOSC). ITINERIS will have significant impact on national environmental research, providing scientific support to the design of actionable environmental strategies.

To this end, ITINERIS will build the Italian HUB of Research Infrastructures in the environmental scientific domain providing a single and unified access point to facilities, FAIR data and related services available within the Italian RIs network through an easy-to-navigate interface.

Through the ITINERIS HUB all users will have access to data and services, with a complete catalogue of data and services and a proper access management system. ITINERIS will not duplicate existing data provided by European RIs, but will make them discoverable in a unique place together with other National RI resources. Additionally, ITINERIS HUB will make accessible data resulting from specific ITINERIS project activities like campaign data and novel data products.

ITINERIS will also offer a system of Virtual Research Environments (VRE), that will provide new services to address scientifically and societally relevant issues starting from an ensemble of cross-disciplinary actions on the data, information and knowledge generated by the different RIs in the different environmental subdomains.

State-of-the-art applications and custom-made tools will be integrated in the HUB to respond to the needs of: collecting and preparing information on products, resources and services to be advertised in a way that users can easily discover and access them and the RIs; facilitating the management of user access requests through the automated workflows and specific features that are peculiar to the online submission and management systems.

The main concept includes a GeoServer with the possibility of discovery, visualization and plotting RI data available. To be more precise, a scalable infrastructure for the provision of mapping services compliant with the standards of the Open Geospatial Consortium (OGC) WMS (Web Map Service), WFS (Web Feature Service), WCS (Web Coverage Service) will be implemented. A Metadata Service will be responsible for providing OGC CSW services. It will therefore offer a data discovery service through metadata consultation, corresponding to the most common metadata profiles. A stack of support services (REST/SOAP interfaces) for queries on geographic databases will be provided.

The ITINERIS HUB is candidate to become the integrated system of the Italian Environmental RIs to be connected to European initiatives like ENVRI-Hub NEXT and EOSC, fostering the ability to address current and expected environmental challenges at National level and beyond.

Acknowledgement

IR0000032 – ITINERIS, Italian Integrated Environmental Research Infrastructures System (D.D. n. 130/2022 - CUP B53C22002150006) Funded by EU - Next Generation EU PNRR- Mission 4 “Education and Research” - Component 2: “From research to business” - Investment 3.1: “Fund for the realisation of an integrated system of research and innovation infrastructures”

How to cite: Dema, C., Izzi, F., Mona, L., Salvia, V., Cornacchia, C., Ripepi, E., Volini, M., and Pappalardo, G.: ITINERIS HUB: the unified access point to Italian environmental facilities, FAIR data and related services, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19603, https://doi.org/10.5194/egusphere-egu24-19603, 2024.

EGU24-20005 | Posters on site | ESSI3.5

Cloudnet – an ACTRIS data repository for cloud remote sensing observations 

Simo Tukiainen, Tuomas Siipola, Niko Leskinen, and Ewan O'Connor

Clouds strongly regulate the Earth’s water cycle and the planetary radiation balance. Clouds are one of the largest contributors to the overall uncertainty in climate feedbacks, propagating into global temperature projections (Arias et al., 2021). Cloudnet data repository provides long-term datasets of cloud property profiles with a high temporal and vertical resolution, derived from synergetic ground-based measurements and numerical weather prediction model data.

These datasets can be used, for example, to validate satellite-based products and to improve the accuracy of climate and weather forecast models. Cloudnet is part of the Aerosol, Clouds and Trace Gases Research Infrastructure (ACTRIS) which is now in the implementation phase and plans to be fully operational in 2025 (Häme et al., 2018).

Cloudnet receives data regularly from around 20 stationary observational facilities. Each facility is equipped with instruments that meet the requirements of the ACTRIS Centre for Cloud Remote Sensing (CCRES). These instruments include Doppler cloud radars, Doppler lidars, ceilometers, microwave radiometers, disdrometers, and weather stations. We also host and process data from mobile and campaign platforms.

Cloudnet processes raw instrument data into cloud property products such as target classification of the scatterers, liquid and ice water content, and drizzle drop size distribution (Illingworth et al., 2007) using the open-source Python package CloudnetPy (Tukiainen et al., 2020). Processed data products are provided in near-real time, typically within one hour from the measurement. In the future, Cloudnet will also provide wind and boundary layer height products derived from Doppler lidar data.

All the raw and processed data are freely available according to the FAIR principles (Wilkinson et al., 2016) via cloudnet.fmi.fi. Furthermore, our software are freely and openly available from https://github.com/actris-cloudnet/.

ACKNOWLEDGEMENTS

We thank the Academy of Finland Flagship (grant no. 337552), the European Union’s Horizon 2020 research and innovation programme (grant no. 654109), and ACTRIS (project no. 739530, grant no. 871115) for funding this project.

REFERENCES

Arias et.al. (2021) Technical Summary. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. (Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA)

Häme et.al. (2018). ACTRIS stakeholder handbook 2018. (Painotalo Trinket Oy).

Illingworth et.al. (2007). Cloudnet: Continuous Evaluation of Cloud Profiles in Seven Operational Models Using Ground-Based Observations. Bulletin of the American Meteorological Society, 6, 88.

Tukiainen, S., O’Connor, E., and Korpinen, A. (2020). CloudnetPy: A Python package for processing cloud remote sensing data. Journal of Open Source Software, 5(53), 2123.

Wilkinson et.al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018.

How to cite: Tukiainen, S., Siipola, T., Leskinen, N., and O'Connor, E.: Cloudnet – an ACTRIS data repository for cloud remote sensing observations, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20005, https://doi.org/10.5194/egusphere-egu24-20005, 2024.

This study employs spatial and semantic modeling to formally specify the intersection of environmental concerns and social justice, focusing on the unequal impact of environmental hazards on the economically disadvantaged people living in the Proctor Creek watershed within the Atlanta Metropolitan area. Our 'Public Health-Urban-Socio-economic-Environmental Justice' (PUSH-EJ) ontology formally integrates environmental justice indices and all concepts from the EPA’s EJScreen, such as Environmental Indicator, Demographic Indicator, EJ Index, Particulate Matter, Risk, and Proximity to hazardous sites. PUSH-EJ also formalizes the National Air Toxics Assessment (NATA)’s Air Toxics Cancer Risk, Air Toxics Respiratory Hazard Index, and Diesel Particulate Matter. The modeled environmental indicators include levels of ozone, particulate matter 2.5 (micrometer or smaller-sized particles), and lead paint (for houses built before 1960) in the air, count of underground leaking storage tanks, and count and proximity to wastewater discharge areas. The ontology also models proximity of housing units to waste and hazardous chemical facilities or sites related to National Priorities List (NPL) Superfund Program, Risk Management Plan (RMP) sites, Treatment, Storage, and Disposal Facilities (TSDFs), and Traffic volume. Environmental, demographic, and socioeconomic indicators are mapped to the objectives of UN SDGs 1, 3, 4, 5, 8, and 10, bridging the gap between environmental justice, public health, urban dynamics, and socio-economic aspects. Our analysis of Proctor Creek's socioeconomic indicators reveals a combined Demographic Index of 73%, driven by Low Income (61%) and People of Color (90%). These findings indicate that Proctor Creek exhibits the lowest scores across all categories when compared to other regions in Georgia, EPA Region 4, and the nation. Our results call for minimizing contamination in the Proctor Creek area and uplifting socioeconomic conditions by the authorities responsible for the watershed. Our spatial analysis highlights urgent priorities in the Proctor Creek area, for the management of air toxic sources, emissions, and addressing proximity issues linked to environmental pollutants from hazardous waste sites, lead paint, and traffic.

How to cite: Davarpanah, A., Shafiei, F., and Jelks, N.: Semantic and spatial Proximity Modeling of Equitable Sustainability in Proctor Creek, Atlanta: Merging Environmental Justice and Sustainable Development, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20704, https://doi.org/10.5194/egusphere-egu24-20704, 2024.

ESSI4 – Advanced Technologies and Informatics Enabling Transdisciplinary Science

EGU24-1074 | ECS | Posters on site | ITS3.3/ESSI4.1

Pan-India analysis of relationship between Spatial Attributes of Urban Area and Population 

Ravi Verma and Pradeep Kumar Garg

Urban structures in any city needs to be analyzed in conjunction to Urban Green Spaces (UGS). The relations between spatial attributes of built-up and UGS Land use/ Land Cover (LULC) can help analyzing various ecosystem services like micro-climate problems in aspects of increasing Land Surface Temperature (LST) patterns causing Urban Heat Island (UHI) inside the city. These relations between both LULC can also improve aesthetic structure of city. India, a magnanimous country comprising of 36 administrative boundaries, shows a range of diversity in population and culture inhabited by its dwellers. These large population centres have different settlement characteristics at different administrative levels (States/Union Territories, Districts, Sub-Districts, Villages/Towns and Wards/Blocks etc.) of India. These settlements can affect climate and development of country in longer duration. As such spatio-temporal analysis of urban population dynamics over different constituent land use/land cover (LU/LC) is performed in this study using open source data and software programs only. The study derives a pattern of Landscape Metrics (LSM) of built-up LULC over a period of 30 years in 7 zones of India comprising of 694 districts in total of various 28 states and 8 UTs. Landscape Metrics are one of the efficient ways to analyze the patterns of LULC in a study area. Publically available data such as Pan India Decadal LULC by ORNL DAAC for year 2005 and Copernicus Global Land service LULC for year 2015 at 100m resolution has been used as classified maps in study. These decadal LULC maps are predominantly classified using multi-temporal Landsat series data for Pan India coverage giving annual LULC classification maps consisting 19 classes with overall classification accuracy of 0.94 for all 3 year data. Built-up class present in both classified maps are used for analysis as urban patches. Landscape metric analysis is done through landscapemetrics library in RStudio® and 34 of the class level landscape metrics were calculated for urban area using multi-patch analysis for multi-year data. Significance of metrics was determined through calculation of coefficient of determination and establishment of variable importance between all 34 landscape metrics for urban and Population averaged over states and UTs containing 694 districts units of India. "Number of Patches (NP)","Total Class Area (CA)", "Total Core Area (TCA)" and "Total Edge (TE)" stood out as most viable metrics showing relation as high as R2 of 0.82 between spatial attributes of urban patches and population in the Indian administrative units. Spatial relation in terms of zones of India is much more existent than temporal as yearly variation for relation between urban patches and population. North, West and North East Zone of India are showing most consistent and highest values of correlation whereas South zone and UTs lowest with Central zone being most inconsistent. Such high relations between spatial patterns of urban patches and population suggest a significant need to prioritize configuration and optimization of population in cities, which can not only affect urbanization pattern inside the city boundary but also help achieving the sustainability causes of ecosystem services in city boundary.

How to cite: Verma, R. and Garg, P. K.: Pan-India analysis of relationship between Spatial Attributes of Urban Area and Population, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1074, https://doi.org/10.5194/egusphere-egu24-1074, 2024.

This study scrutinizes the impact of an anomalous early summer Land Surface Temperature (LST) surge on food security, energy dynamics, and human health in India's National Capital Region (NCR), and its implications for Sustainable Development Goals (SDGs). By analyzing MODIS images and employing Standard Anomaly (StA), monthly diurnal LST ranges were assessed. Results reveal March temperatures peaking from 23.11 to 41.57 °C, 3.5 °C above the average 21.78 to 39.41 °C range. Notably, contrary to conventional patterns, prolonged rain deficits drive this early summer warming rather than Sea Surface Temperature (SST). This warming adversely affects SDGs, significantly reducing crop yields, jeopardizing SDG-2's Zero Hunger target, impeding indicator SDG-2.4.1, and disrupting target 3.4.1 for health. Moreover, heightened energy consumption due to early summer warming disrupts SDG-6 on clean energy, directly impacting target 7.1 for electricity access. The findings underscore the urgency of addressing early summer warming's impact to progress toward achieving SDGs in India's NCR. Understanding and mitigating these effects are imperative for sustainable development initiatives in the region.

How to cite: Mahato, S. and Joshi, P. K.: Rising Temperatures, Rising Concerns: Early Summer and Sustainable Development in National Capital Regions of India, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1207, https://doi.org/10.5194/egusphere-egu24-1207, 2024.

EGU24-4493 | ECS | Posters on site | ITS3.3/ESSI4.1

Building Height Estimation at 10m across multiple countries using Sentinel Time-Series 

Ritu Yadav, Andrea Nascetti, and Yifang Ban

With the rapid shift of urban population to cities, urbanization monitoring has become essential to ensure sustainable development. In the last decade, 2D urban monitoring such as building footprint extraction has received considerable attention resulting in multiple high and low-resolution products. But despite being the essential component of urbanization, the vertical dimension (height) has not been studied at a large scale. Accurate estimation of building height plays an important role in urban planning, as it is correlated with energy consumption, population, transportation, city planning, urban climate and many other monitoring and planning required for sustainable development.

Airborne LiDAR or high-resolution orthophotos can be used for accurate building height estimation but for large-scale monitoring applications, the data collection itself is extremely expensive. With a compromise of resolution, Earth observation data, especially free-of-cost data can be used for large-scale monitoring. Existing large-scale building height estimation methods operate at low resolution (1km to 100m). A few of the recent studies improved the resolution to 10m while operating in a few cities to few states of the country. In this study, we estimate building heights across four countries. We propose a DL model that operates on a time series of Sentinel-1 SAR and Sentinel-2 MSI data and estimates building height at 10m spatial resolution. Our model estimates building height with 1.89m RMSE (Root Mean Square Error) surpassing the best score of 3.73m reported in previous studies. 

To demonstrate the effectiveness of our approach, we tested it on data from four countries and compared it with a baseline and four recent DL networks. We evaluate the impact of time series input and individual input modality i.e., SAR and optical data on the performance of the proposed model. The model is also tested for generalizability. Furthermore, the predicted building heights are downsampled and compared with GHSL-Built-H R2023A, a state-of-the-art product at 100m spatial resolution. The results show an improvement of 0.3m RMSE.

References

[1] Yadav, R., Nascetti, A., & Ban, Y. (2022). BUILDING CHANGE DETECTION USING MULTI-TEMPORAL AIRBORNE LIDAR DATA. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B3-2022, 1377–1383. https://doi.org/10.5194/isprs-archives-xliii-b3-2022-1377-2022

[2] Yang, C., &; Zhao, S. (2022). A building height dataset across China in 2017 estimated by the spatially-informed approach. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01192-x

[3] Cai, B., Shao, Z., Huang, X., Zhou, X., & Fang, S. (2023). Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data. International Journal of Applied Earth Observation and Geoinformation, 122, 103399. https://doi.org/10.1016/j.jag.2023.103399

[4] Yadav, Ritu, Andrea Nascetti, and Yifang Ban. "A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series." arXiv preprint arXiv:2307.01378 (2023)

[5] Pesaresi, M., and P. Politis. "GHS-BUILT-H R2023A—GHS Building Height, Derived from AW3D30, SRTM30, and Sentinel2 Composite (2018)." (2018)

 

How to cite: Yadav, R., Nascetti, A., and Ban, Y.: Building Height Estimation at 10m across multiple countries using Sentinel Time-Series, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4493, https://doi.org/10.5194/egusphere-egu24-4493, 2024.

EGU24-4584 | Orals | ITS3.3/ESSI4.1

Solar Park Detection Based On Machine Learning 

Shivam Basant, Jayaluxmi Indu, and Biplab Banerjee

Solar energy shall be an indispensable part in India’s clean energy transition. As renewable energy requires large amount of space considerations, policy makers often question the land based targets for deploying solar parks. A robust geospatial information on existing solar parks shall be crucial for both the governments and policy makers.

This study presents a novel method to detect solar parks using a synergy of satellite imagery from Sentinel-2 and convolutional neural networks (CNN). For the work, a total of nearly 2000 satellite images from Sentinel-2 were chosen over ten number of solar parks situated in India. Case study results are presented for the solar parks in India namely Bhadla Solar Park, Rajasthan, and Pavagada Solar Park, Karnataka. This dataset measures solar footprint over India and examines environmental impacts of solar parks over nearby ecosystem.

How to cite: Basant, S., Indu, J., and Banerjee, B.: Solar Park Detection Based On Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4584, https://doi.org/10.5194/egusphere-egu24-4584, 2024.

EGU24-8575 | ECS | Posters on site | ITS3.3/ESSI4.1

Optimizing UAV seaweed mapping through algorithm comparison across RGB, multispectral, and combined datasets 

damir akhmetshin, Owen Naughton, Leon Cavanagh, and Dean Callaghan

The use of unmanned aerial vehicles (UAVs) with off-the-shelf RGB and multispectral sensors has expanded for environmental monitoring. While multispectral data enables analysis impossible with RGB, visible range cameras have benefits for large-scale habitat mapping. This research compared RGB, multispectral, and fused RGB-multispectral data from UAVs for seaweed mapping along the Irish coast. Three classification algorithms – Random Forest, Maximum Likelihood Classifier and Support Vector Machines – were tested on the three datasets to compare accuracies for seaweed species delineation and percent cover estimation. The RGB sensor effectively classified broad intertidal classes, but struggled differentiating some seaweed species. Multispectral data significantly improved species-level classification accuracy but tended to overestimate the presence of red and green algae. Fusing the RGB and multispectral data improved species classification accuracy over multispectral and RGB images. The results demonstrate the benefits of RGB sensors for broad habitat mapping and cover estimation, and multispectral for detailed species delineation. Fusion of the two sensor types enhances the strengths of both. This highlights the potential for UAVs paired with off-the-shelf visible range and multispectral cameras to provide detailed, accurate, and affordable change monitoring of intertidal seaweed habitats.

How to cite: akhmetshin, D., Naughton, O., Cavanagh, L., and Callaghan, D.: Optimizing UAV seaweed mapping through algorithm comparison across RGB, multispectral, and combined datasets, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8575, https://doi.org/10.5194/egusphere-egu24-8575, 2024.

EGU24-12274 | ECS | Posters on site | ITS3.3/ESSI4.1

MOSMIN: Multiscale observation services for mining related deposits 

Sandra Lorenz, Moritz Kirsch, René Booysen, and Richard Gloaguen

The transition towards a green economy has led to an increased demand for raw materials, which are mainly sourced by mining. Mining activities generate residues such as rock wastes, tailings and stockpiles. These materials are associated with environmental and safety risks that need to be carefully managed throughout their life cycle, with an emphasis on stability and the prevention of water and soil pollution. Earth-observation (EO)-based techniques are seldom used for monitoring these deposits, and multi-sensor field data is commonly not integrated despite recent technological advances. We will develop holistic, full-site services for the geotechnical and environmental monitoring as well as valorisation of mining-related deposits based on a combination of EO and in situ geophysical data. The work will be accomplished under the “Multiscale Observation Services for Mining related deposits” project (MOSMIN for short), and funded by the European Union Agency for the Space Programme (EUSPA) with project number 101131740. MOSMIN services will use Copernicus EO data for time-resolved, spatially extensive, remote monitoring of ground deformation and surface composition. Innovative change detection algorithms will highlight displacements and identify environmental hazards. Satellite data will be integrated with real-time, high-resolution data obtained from unoccupied aerial vehicles and sensors installed at the site, leveraging the power of machine learning for fusion and resolution enhancement of multi-scale, multi-source data. Novel, non-invasive geophysical techniques such as distributed fibre-optic sensing will provide subsurface information to identify potential risks such as internal deformation and seepage. In collaboration with international mining companies, MOSMIN will use pilot sites in the EU, Chile and Zambia to develop and trial comprehensive monitoring services, which are calculated to have a Total Available Market of €1.2bn and expect to be commercialised shortly after project completion by three industry partners. The MOSMIN integrative service and tools will improve the efficiency and reliability of monitoring, maximise resource utilisation and help mitigate environmental risks and the impact of mining operations. - On behalf of the MOSMIN consortium.

How to cite: Lorenz, S., Kirsch, M., Booysen, R., and Gloaguen, R.: MOSMIN: Multiscale observation services for mining related deposits, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12274, https://doi.org/10.5194/egusphere-egu24-12274, 2024.

EGU24-12810 | ECS | Posters on site | ITS3.3/ESSI4.1

Urban deprivation and enhanced inequality in sub-Saharan Africa 

Chengxiu Li and Le Yu

Globally, 1.2 billion urban dwellers live in slums facing essential service deficiencies and heightened vulnerability, thereby challenging the United Nations' commitment to "Leave no one behind" in achieving Sustainable Development Goals (SDGs). We investigated availability of key urban services (water, sanitation, housing, living spaces) that define slums, revealing that 58.9% of households in 27 African countries lack access to at least one of above service based on household surveys, leading to their categorization as slums households. While slum proportion has decreased over the past two decades, however inequality has rose in countries with a high prevalence of slums.

Through the integration of household surveys, geospatial data, and machine learning algorithms, we estimated the wealth level and key service availability across sub-Saharan Africa. This approach revealed that 53.4% of urban population resides in slums, surpassing the UN's estimate of 44.9%. This study revealed that poor urban service in slums exacerbate inequality, however current aggregated statistics underestimate the extent of under-serviced urban slums, leading to ineffective efforts in building prosperity for all.

How to cite: Li, C. and Yu, L.: Urban deprivation and enhanced inequality in sub-Saharan Africa, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12810, https://doi.org/10.5194/egusphere-egu24-12810, 2024.

Accounting of the hydrologic process of evapotranspiration (ET) or consumptive use of water is important for water resources allocation, irrigation management, drought early warning, climate change impact assessment as well as in agro-water-climate nexus modeling. In fact, monitoring the United Nations' sustainable development goals (SDGs) that emphasize on improved food security, access to clean water, promotion of sustainable habitats and mitigation of natural disasters (droughts) hinge upon access to better quality data of ET. Though numerous studies have targeted accurate estimation of potential evapotranspiration (PET) using earth observation (EO) data; hydrologists are yet to reach consensus on the best set of predictor variables that can be used irrespective of spatio-temporal scale. This can be attributed to the nonlinear and complex nature of the process of ET. When it comes to the estimation of actual ET (AET), studies employing Eddy Covariance (EC) towers have been successful in different regions of the world. However, the developing countries of the world lack access to EC observations, requiring viable economical methods for accurate ET measurement, even using reliable estimates of PET. The proposed study explored fusion of regional climate reanalysis data, EO data, and machine learning techniques for high-resolution PET estimation. In this analysis, owing to the documented success of data-driven models in hydrological studies, performance of two machine learning models- tree based Random Forest (RF) and regressor Multivariate Adaptive Regression Splines (MARS), are evaluated for estimating monthly PET. A suite of input predictors are chosen to describe three model categories: meteorological-, EO- and hybrid-based predictor models. There are about 10 input combinations that can be generated for the PET model development, particularly for an agriculture-dominated study region - Dhenkanal district, located in Odisha in eastern part of India. In this study, reanalysis-based (meteorological) inputs at a grid resolution of 0.12° and Sentinel 2A (EO) products at spatial resolution of 20 m have been used. Results of the analysis indicate that solar radiation is the most important meteorological variable that controls PET estimation. Among the vegetation indices obtained from remote sensing data, we find that the Normalized Difference Water Index (NDWI) that represents availability of water in plants and soil, is particularly useful. The best PET estimation model that uses only solar radiation and few vegetation indices (NDVI, NDWI) gave coefficient of determination (R2) 0.88 and root mean square error (RMSE) of 0.14 during validation stage, whereas the use of hybrid predictor model that utilize temperature and vegetation indices information further reduced the error and increased the prediction accuracy (6.86%). When the meteorological inputs: precipitation and wind speed are only used, model did not perform well. Mapping the ET using the proposed models can facilitate reporting of progress in SDG with regard to water use, crop water stress, adaptation to agricultural droughts and food security. In this context, the Evaporative Demand Drought Index (EDDI) is computed across the study region to understand the drought patterns in the region.

Keywords: Potential Evapotranspiration, Agricultural Drought, Food Security, EO Data, Random Forest, Machine Learning, Vegetation Indices

How to cite: Tripathy, S. S. and Ramadas, M.: Data Fusion of Regional Reanalysis- and Sentinel (Earth Observation)-based Products with Machine Learning Tools for Monitoring Evapotranspiration and Drought, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15184, https://doi.org/10.5194/egusphere-egu24-15184, 2024.

EGU24-15509 | ECS | Orals | ITS3.3/ESSI4.1 | Highlight

Advancing Forest Cover and Forest Cover Change Mapping for SDG 15: A Novel Approach Using Copernicus Data Products 

Chiara Aquino, Manuela Balzarolo, Maria Vincenza Chiriacò, and Monia Santini

Forests are the major component of the terrestrial ecosystem and provide an essential source of livelihood to local communities. Nevertheless, forests worldwide are increasingly threatened by natural and human-driven activities, such as extensive logging for the extractive industries, severe weather, pests and wildfires. A responsible forest management substantially contribute to the protection and conservation of forest ecosystem and services. The United Nations’ Sustainable Development Goals (UN SDGs) 15 “Life on land” – and specifically indicator 15.1.1 “Forest area as a proportion of total land area” – is concerned with mapping and protecting forest ecosystems.  At the European Union (EU) level, the UN indicator 15.1.1 is translated into EUROSTAT indicator “Share of forest area”.  Monitoring of this indicator enhance compliance with EU policies of land use and land cover, supporting the EU forest strategy for 2030 and helping to implement the regulation on deforestation-free products.

The SDGs-EYES project is a major EU-wide initiative aiming at exploiting data and information coming from the European Copernicus Programme to develop, implement and deploy a new service for monitoring SDG targets. It will provide novel and robust workflows to consistently assess SDG indicators across EU countries, with potential for global upscaling. In recent years, the release of frequent and high-resolution satellite data from the Copernicus Sentinel missions has opened new frontiers for consistently mapping global forest cover.  Nevertheless, detecting small-scale forest disturbance - also known as forest degradation - remains a challenging task. Studies aiming at quantifying the carbon emissions and extent of forest degradation show that it affects land portions similar to, or even larger, than deforestation. It is clear that accurate forest cover maps are urgently needed to avoid underestimating the loss of forest habitats, thereby preventing further carbon emissions, land degradation and biodiversity decline.

In this study, we apply a cumulative sum change detection algorithm on Sentinel-1 and Sentinel-2 time-series data to estimate forest cover and forest cover change in the Olt River basin, Romania, for the 2020-2022 period. Romania hosts the largest share (218,000 ha) of the EU's temperate primary and old-growth forests, many of which have been logged, both legally and illegally, although officially under protection by national parks or Natura 2000 sites. Through the integration of multi-sensor information (e.g. Sentinel-1 and 2, ESA CCI WorldCover), the resulting maps are able to detect hotspots of forest cover change at 20 m resolution, while also providing exact timing of the disturbance events. The suggested approach, hosted on the SDGs-EYES platform, provides a scalable methodology that can be systematically used in other geographical areas and for selected periods of interest. In this way, we enhance monitoring and evaluation of indicator 15.1.1, in agreement with the UN and EU indicators while improving the current weaknesses of the two frameworks.

 

How to cite: Aquino, C., Balzarolo, M., Chiriacò, M. V., and Santini, M.: Advancing Forest Cover and Forest Cover Change Mapping for SDG 15: A Novel Approach Using Copernicus Data Products, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15509, https://doi.org/10.5194/egusphere-egu24-15509, 2024.

EGU24-16115 | ECS | Orals | ITS3.3/ESSI4.1

The Potential of Remote Sensing for Enhancing a Sustainable Agricultural Intensification under a Changing Climate in West Africa 

Jonas Meier, Frank Thonfeld, Verena Huber Garcia, Kenneth Aidoo, Niklas Heiss, and Ursula Gessner

The challenges of climate change in West Africa are closely linked to food security in the region. Rising temperatures and increasingly variable precipitation threaten traditional rain-fed agriculture relying on the rainy season. Climate change is affecting the rainy season in West Africa in multiple ways, e.g., by shifting the onset, shortening its duration and increasingly interrupting the growing period by dry spells. An increase of extreme weather events such as heavy precipitation or storms add another risk to agriculture. The risk of crop failures hits an already vulnerable system. Since a large portion of food is imported the West African countries are vulnerable to external economic shocks. Furthermore, West Africa has one of the highest population growth rates in the world, its population will increase to 1.2 billion people by 2050. To guarantee sufficient food supply and to achieve the Sustainable Development Goals (SDG), a sustainable intensification of agriculture is needed (i.e., increasing yields without additional land consumption and without adverse effects on climate change) and mitigation and adaption strategies against the negative effects of climate change are required. Remote sensing has proven to be a suitable instrument to measure and evaluate both, mitigation and adaptation actions in a reliable and cost-effective way. Depending on the method of cultivation, agriculture causes different amounts of greenhouse gas (GHG) emissions. Remote sensing can provide information about biophysical development as input and reference data for land surface models to assess the produced GHG under different cultivation practices. Since the negative impact of climate change on agriculture is already measurable and visible, adaptation measures are highly important. They differ in terms of their complexity, their technical feasibility and their costs. Adaptation measures can be for example a change in land management, the choice of crop variety or technical innovation like weather forecast or irrigation systems. In various interdisciplinary research projects (CONCERT, COINS, AgRAIN), we selected adaptation measures of varying complexity and monitor and evaluate them using remote sensing-based analysis, mainly on Sentinel-1, Sentinel-2 and Planet data. The analyses range from land cover and land use mapping to crop classification, crop suitability modeling, field boundary delineation, identification of management events, and site-specific productivity measurements. We employ a range of methodologies, including random forest regression, convolutional neural networks (CNN), fuzzy logic approaches, and time series analysis. The results serve as a basis for local stakeholders and decision-makers, enabling the implementation of proven adaption measures to enhance resilience against climate change and promote sustainable agricultural intensification.

How to cite: Meier, J., Thonfeld, F., Huber Garcia, V., Aidoo, K., Heiss, N., and Gessner, U.: The Potential of Remote Sensing for Enhancing a Sustainable Agricultural Intensification under a Changing Climate in West Africa, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16115, https://doi.org/10.5194/egusphere-egu24-16115, 2024.

Natural ecosystems, especially primary forests, are impacted by the rapid expansion of human land use and global climate change, putting the most bio-diverse areas of our planet under threat. Large amounts of Earth Observation and analysis-ready data sets are made available (almost) for free. Yet, the usage of such data in conservation finance and policy making does currently not live up to its full potential. It is a complex endeavor to access relevant portions of Big Geospatial Datasets efficiently due to the high number of different data providers, formats and interfaces. Even more important, we need to generate information in an open and reproducible way to take informed decisions to allocate funds responsibly and maximize public goods and benefits

MAps for Planning, Monitoring and Evaluation (MAPME) is an collaborative initiative based on OpenScience principles to leverage the potential of geospatial data for relevant actors in the development cooperation sector. The initiative is driven by Free and Open Software (FOSS) enthusiasts within German (KfW, GIZ) and French (AFD, IRD) development institutions. Together with our partner countries we are key decision makers in the allocation of the so-called Official Development Assistance (ODA). To bridge the “last-mile” gap between vast amounts of openly available geospatial data sets and productive monitoring applications, we have developed an OpenSource software used within our institutions.

The software is written in R and relies on the Geospatial Data Abstraction Library (GDAL) bindings provided by the `sf` and `terra` packages. It allows efficient analysis of large data collections on deforestation and greenhouse gas emissions such as Global Forest Watch (GFW). Focusing on expandability, everyone can include new in-house or open data sets, and custom analysis routines. Thus, the functionality can be extended to other sectors beyond forest monitoring. It opens the way to deliver crucial information on the state of ecosystems around the globe in a timely and reproducible way, allowing our institutions to make better allocation decisions.

We will present the MAPME Initiative and shed a light on our approach to developing applications based on FOSS. We will showcase first data solutions build by our partners on top of the framework, such as a geospatial impact evaluation of preventing deforestation and a dashboard for continuous monitoring of protected areas of the German development cooperation portfolio.

How to cite: Görgen, D. and Schielein, J.: MAPME – Versatile analysis tool for big geospatial data in the context of sustainable development, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18584, https://doi.org/10.5194/egusphere-egu24-18584, 2024.

EGU24-18869 | Orals | ITS3.3/ESSI4.1

Wildfire as an interplay between water deficiency, manipulated tree species composition and bark beetle. A remote sensing approach 

Jana Müllerová, Jan Pacina, Martin Adámek, Dominik Brétt, and Bobek Přemysl

During 2022, Bohemian Switzerland NP was affected by the largest wildfire in the Czech Republic throughout its modern history. The NP is formed by sandstone towers, deep narrow valleys and dense forests. From the 19th century onwards, Norway spruce and non-native Weymouth pine were massively planted here. A series of weather extremes in the last years caused an exceptional drought and consequent massive bark beetle outbreak and spruce die off, followed by the catastrophic event. Wildfires of such a dimension are rather uncommon in Central Europe, and this event therefore serves as a perfect model situation to study the role of species composition, bark beetle and water availability on the fire dynamics as well as the changes in biodiversity and natural succession after the disaster. Before the fire, the area was dominated by conifers, mostly standing dry after the bark beetle attack except along the water courses, and further formed of clear cuts, healthy deciduous beech forests and rocky outcrops. 

Pre-fire vegetation state, fire severity and post-fire regeneration were assessed using a combination of remote sensing sources. In particular, we used pre- and post-fire series of Sentinel-2 satellite MSS imagery, and acquisition of multispectral (MSS) and LIDAR data. The whole area was sampled from small aircraft TL232 Condor by three sensors - photogrammetric camera Hasselblad A6D-100c (ground sampling distance - GSD - 5 cm), MSS sensor MicaSense Altum (GSD 32 cm) and LIDAR RIEGL VUX 1-LR (13 points/m), and detailed sites were sampled using drone mounted sensors - MSS (MicaSense Altum, GSD 5 cm) and LIDAR (DJI L1). Forest composition and changes in health status were derived using a range of spectral indices and supervised classification. Fire severity and forest structure were derived using a combination of Lidar and optical point cloud, fisheye camera, ground sampling, and analysis of optical data (supervised classification, vegetation indices). 

Our research revealed that fire disturbance was low or none at native deciduous tree stands and waterlogged sites. On the opposite, it was more severe at dry bark-beetle clearings covered by a thick layer of litter as compared to standing dead spruce. We can infer that in places where many stems were only partly burned or the trees postponed the die-off, the fire went faster and the severity of disturbance was lower. In some cases, we could see patterns formed by ground fire, such as burned circles around trees or tree stools surrounded by unburned areas. Post-fire regeneration is very fast, and even after one year, vegetation growth can be detected using LIDAR and photogrammetric point clouds. Derived information on fire severity, detailed 3D stand structure and health status are to be used as a proxy of the fire disturbance impact on biodiversity and patterns of regeneration.

How to cite: Müllerová, J., Pacina, J., Adámek, M., Brétt, D., and Přemysl, B.: Wildfire as an interplay between water deficiency, manipulated tree species composition and bark beetle. A remote sensing approach, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18869, https://doi.org/10.5194/egusphere-egu24-18869, 2024.

EGU24-19568 | ECS | Orals | ITS3.3/ESSI4.1

Ocean Acidification: Weaves to be tied on European and global scale 

Elisabeth Kubin, Marina Lipizer, Maria Eugenia Molina Jack, Megan Anne French, and Alessandra Giorgetti

Oceanic uptake of anthropogenic CO2 is altering the seawater chemistry of the oceans, leading to a decrease in pH and thus to ocean acidification (OA). This has multiple consequences not only for marine biogeochemistry, but also for marine biota and ecosystems. Therefore, the Sustainable Development Goal SDG Target 14.3 addresses OA and the SDG 14.3.1 calls for the average marine acidity (pH) and on guidance on monitoring and reporting OA data.

Here we want to present the international collaboration between the European Marine Observation and Data Network (EMODnet Chemistry), NOAA and UNESCO on how to observe and report OA data, following the FAIR (Findable, Accessible, Interoperable and Reusable) principles. The final aim is to enable global comparisons of the changes in ocean chemistry and to provide a unified, globally coordinated, sustained, long-term observation network and database. Detailed vocabularies and the according metadata will guarantee the correct description of the carbonate system and thus also the long term usability of the data, including reliable trend calculations.

This global collaboration will provide more accurate and detailed OA data and will help policy and decision makers to communicate more clearly and precisely about the impacts of climate change on marine ecosystems and resources, enabling holistic approaches.

How to cite: Kubin, E., Lipizer, M., Molina Jack, M. E., French, M. A., and Giorgetti, A.: Ocean Acidification: Weaves to be tied on European and global scale, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19568, https://doi.org/10.5194/egusphere-egu24-19568, 2024.

EGU24-19821 | Orals | ITS3.3/ESSI4.1

Remote-sensing based tools to monitor grassland ecosystem services 

Maria P. González-Dugo, Maria José Muñoz-Gómez, Cristina Gálvez, Ángel Blázquez-Carrasco, M. Dolores Carbonero, Francisco S. Tortosa, Juan Carlos Castro, José Guerrero-Casado, Juan Castro, Sergio Colombo, Manuel Arriaza, and Anastasio Villanueva

The provision of ecosystem services (ES) by agricultural systems is a shared objective of agricultural policies in most developed countries in response to an increasing demand from society. Sustainable management of grassland ecosystems leads to enhanced soil fertility, ensures food security, acts as natural filters and purifiers of water, and functions as carbon sinks, sequestering carbon dioxide and mitigating climate change. All of these goals are deeply interconnected with several SDGs. The Common Agricultural Policy (CAP) of the European Union is environmentally oriented. However, a broad consensus indicates that the current policy instruments are not effectively promoting the provision of ES. Thus, it is essential to develop efficient and innovative policy instruments to enhance ES's agricultural provision. One of the challenges for applying new policy instruments, such as results-based payments (OECD, 2015), is the quantification of ES supply, usually involving intensive and specialized field data. Therefore, there is a need to create quantitative indicators for ES based on reliable and affordable data. Remote sensing data can be an effective tool, especially if the data are easily accessible, available at an appropriate scale, and provided free of cost.

Olive groves and Mediterranean oak savanna were used in this work as case studies to examine the herbaceous layer's contribution to the provision of ecosystem services. In both ecosystems, grasslands play a relevant role in supplying provisioning (such as forage, freshwater or genetic library), regulating (carbon sequestration, soil conservation, climate, and air quality regulation) and cultural services (aesthetic appreciation, cultural identity). The biomass or above-ground net primary production (ANPP) and biodiversity are essential integrators of ecosystem functioning. Biomass is responsible for the input level of various ecosystem services, and it is directly connected to carbon sequestration and soil conservation. Biodiversity, on the other hand, contributes to the processes that underpin other ecosystem services and constitutes an ecosystem good that humans directly value. This work describes the general scheme to measure several grassland ES (GES) in olive groves and oak savannas, including ANPP, biodiversity, carbon sequestration, and aesthetic appreciation, and preliminary results about the ANPP and biodiversity are presented. 

How to cite: González-Dugo, M. P., Muñoz-Gómez, M. J., Gálvez, C., Blázquez-Carrasco, Á., Carbonero, M. D., Tortosa, F. S., Castro, J. C., Guerrero-Casado, J., Castro, J., Colombo, S., Arriaza, M., and Villanueva, A.: Remote-sensing based tools to monitor grassland ecosystem services, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19821, https://doi.org/10.5194/egusphere-egu24-19821, 2024.

EGU24-20511 | Orals | ITS3.3/ESSI4.1

Using Copernicus High Resolution Layer Imperviousness Density to monitor soil sealing in agricultural areas (SDG 2: Zero Hunger) 

Wendy Fjellstad, Svein Olav Krøgli, Jonathan Rizzi, and Agata Hościło

Many countries have goals and strategies to reduce soil sealing of agricultural land to preserve food production capacity. This is essential in relation to Sustainable Development Goal 2: Zero Hunger. To monitor progress, reliable data are needed to quantify soil sealing and changes over time. We examined the potential of the Copernicus High Resolution Layer Imperviousness Density (HRL IMD) to assess soil sealing in agricultural areas in Poland and Norway.

We quantified the accuracy and reliability of the products Imperviousness Classified Change (IMCC) for the period 2015-2018 and Imperviousness degree (IMD) for the reference year 2018. We found a very high overall accuracy of IMCC 2015-2018 in both Poland and Norway. However, this was mainly due to the dominance of area with no change.  When we focused on the small areas where change does occur, we found low user accuracy, with an overestimation of soil sealing. The producer accuracy was generally much higher, meaning that real cases of soil sealing were captured. This is a much better result than if IMCC had under-estimated soil sealing. It suggests that IMCC can play a valuable role in detecting soil sealing, by highlighting areas where soil sealing may have occurred, allowing the user to carry out a further control of this much smaller area, without having to assess the great expanse of unchanged area.

We conclude that the datasets provide useful information for Europe. They are standardised and comparable across countries, which can enable comparison of the effects of policies intended to prevent soil sealing of agricultural land. We advise caution in using older versions of the change data. In particular, it is advisable to merge the closely related classes “1: new cover” and “11: increased cover” and the same for “2: loss of cover” and “12: decreased cover”. These distinctions are not reliable, but the general information about increase or decrease is much better. The transition to finer resolution (10 x 10 m) in the newer datasets represents a great improvement and will make the change data more reliable and useful in future versions.

How to cite: Fjellstad, W., Krøgli, S. O., Rizzi, J., and Hościło, A.: Using Copernicus High Resolution Layer Imperviousness Density to monitor soil sealing in agricultural areas (SDG 2: Zero Hunger), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20511, https://doi.org/10.5194/egusphere-egu24-20511, 2024.

Satellites strive to strike a delicate balance between temporal and spatial resolution, thereby rendering the achievement of high resolution in both aspects challenging. In particular, Earth observations at sub-daily intervals are more diffuclt. Spatiotem-poral fusion algorithms have emerged as a promising solution to tackle this challenge. However, the current spatiotemporal fusion methods still face a critical challenge: accurately and efficiently predicting fine images in large-scale area applications, while en-suring robustness. To address this challenge, the study proposes a multiscale Attention-Guided deep optimization network for Spatiotemporal Data Fusion (AGSDF) method. An optimization strategy is employed to directly predict high-resolution image at multi-scales based coarse-resolution image. Specifically, a varia-tion attention module is proposed to focus on the edges and tex-tures of abrupt land cover changes. The spatiotemporal fusion kernel is developed to provide essential spatial details for spatio-temporal fusion. Furthermore, the implementation of spatiotem-poral fusion at multiple scales improves the reliability of predic-tion. The performance and robustness of AGSDF were evaluated and compared to nine methods at six sites worldwide. The exper-imental results indicate that AGSDF achieves a better overall performance in quantitative accuracy assessment, transfer ro-bustness, predictive stability and efficiency. Consequently, AGSDF holds the high potential to produce accurate remote sens-ing products with high temporal and spatial resolution across extensive regions.

How to cite: Jin, D.: AGSDF: A Multiscale Attention Guided DeepOptimization Network for Spatiotemporal Fusion, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2675, https://doi.org/10.5194/egusphere-egu24-2675, 2024.

EGU24-3127 | ECS | Posters on site | ESSI4.2

Cracking Ground Truth Barriers: Harnessing the Power of Transfer Learning for Crop Mapping 

Miloš Pandžić, Dejan Pavlović, Sanja Brdar, Milan Kilibarda, and Oskar Marko

Transfer learning (TL) is rapidly gaining popularity in recent years in various research disciplines due to its practicality, temperate need for resources, and often quite promising results. The same principle applies for Earth observation, especially for tasks such as crop mapping where TL already showed a certain potential. Our focus in this research was on temporal transfer learning for a single agricultural region. In our study we built an initial CNN-1D crop mapping model for Vojvodina province, Serbia, using SAR satellite imagery and ground truth (GT) data collected for 2017-2020. We did it using a leave-one-year-out approach where each year served only once as a validation dataset. The top-performing model was further employed for transfer learning analysis, utilising a limited set of target season ground truth data. The aim was to diminish reliance on labour-intensive and time-consuming large-scale ground truth data collection, typically carried out through hands-on field inspections. Instead of collecting it all over Vojvodina for the 2021 season, we tried to focus on a limited area around the departure point. Three options were analysed, i.e., approximately 20, 25, and 30 km radius around the departure point for which the province capital Novi Sad was taken. From the practical standpoint, labels of these parcels are easier to record than those more distant (distributed), so it seems reasonable to visit only these locations to reduce the costs of ground truth collection. Visited parcels that fell within these radiuses served for retraining the model, and the remaining parcels (those outside 30 km radius) served for testing and accuracy assessment. For each parcel 50 randomly selected pixels were used for the analysis. After 5 retraining cycles, the average F1 score for transfer learning approach of the CNN-1D model for 20, 25 and 30 km buffer zones was 74%, 79% and 83%, respectively. Training the same CNN-1D model from scratch reached 69%, 73% and 78% respectively, i.e., approximately 5% lower score on average. Inferencing using the pre-trained model as such (without adaptations) achieved F1 score of 78%, which set 20 km radius case to the irrational use of TL, while the use of other two buffer areas were justified as they achieved comparably better results. Also, three buffer cases achieved between 3% and 9% lower F1 score than their respective pairs when the same number of parcels used for retraining were randomly distributed all over the test area. This was likely in a relationship with the restricted sampling region characteristics (uniform soil type, management practice, weather conditions) and distribution of classes in that region which may not have properly represented the entire test area. In addition, the comparison of these two approaches showed that adding more samples for retraining scaled down the difference. The viability of the presented approach was confirmed within the experiment and, therefore, practitioners are encouraged to weigh the trade-off between practicality and accuracy in their future work.

How to cite: Pandžić, M., Pavlović, D., Brdar, S., Kilibarda, M., and Marko, O.: Cracking Ground Truth Barriers: Harnessing the Power of Transfer Learning for Crop Mapping, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3127, https://doi.org/10.5194/egusphere-egu24-3127, 2024.

Airborne hyperspectral remote sensing data provide a wide range of rapid, non-destructive and near laboratory quality reflectance spectra for mineral mapping and lithological discrimination, thereby ushering an innovative era of remote sensing. In this study, NEO HySpex cameras, which comprise 504 spectral channels in the spectral ranges of 0.4–1.0 μm and 1.0–2.5 μm, were mounted on a delta wing XT-912 aircraft. The designed flexibility and modular nature of the HySpex aircraft hyperspectral imaging system made it relatively easy to test, transport, install, and remove the system multiple times before the acquisition flights. According to the design fight plan, including the route distance, length, height, and flight speed, we acquired high spectral and spatial resolutions airborne hyperspectral images of Yudai porphyry Cu (Au, Mo) mineralization in Kalatag District, Eastern Tianshan terrane, Northwest China.

Using hyperspectral images on our own HySpex airborne flight, we extracted and identified alteration mineral assemblages of the Yudai porphyry Cu (Au, Mo) mineralization (Kalatag District, northwest China). The main objectives of this study were to (1) acquire HySpex airborne hyperspectral images of the Yudai Porphyry Cu (Au, Mo) mineralization, (2) determine a workflow for processing HySpex images, and (3) identify alteration minerals using a random forest (RF) algorithm and a comprehensive field survey.

By comparing the features of the HySpex hyperspectral data and standard spectra data from the United States Geological Survey database, endmember pixels of spectral signatures for most alteration mineral assemblages (goethite, hematite, jarosite, kaolinite, calcite, epidote, and chlorite) were extracted. After a HySpex data processing workflow, the distribution of alteration mineral assemblages (iron oxide/hydroxide, clay, and propylitic alterations) was mapped using the random forest (RF) algorithm. The experiments demonstrated that the workflow for processing data and RF algorithm is feasible and active, and show a good performance in classification accuracy. The overall classification accuracy and Kappa classification of alteration mineral identification were 73.08 and 65.73%, respectively. The main alteration mineral assemblages were primarily distributed around pits and grooves, consistent with field-measured data. Our results confirm that HySpex airborne hyperspectral data have potential application in basic geology survey and mineral exploration, which provide a viable alternative for mineral mapping and identifying lithological units at a high spatial resolution for large areas and inaccessible terrains.

How to cite: Shanshan, W.: Identifying and Mapping Alteration Minerals Using HySpex Airborne Hyperspectral Data in the Yudai Porphyry Cu (Au, Mo) Mineralization, Kalatag District, NW China, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5528, https://doi.org/10.5194/egusphere-egu24-5528, 2024.

Semi-arid terrestrial ecosystems exhibit sparse vegetation, characterised by herbaceous non-woody plants (e.g., forbs or grass) and woody plants (e.g., trees or shrubs). These ecosystems encounter challenges from global climate change, including shifts in rainfall, temperature variations, and elevated atmospheric carbon dioxide (CO2) levels. Effective monitoring is essential for informed decision-making and sustainable natural resource management in the context of rapid environmental changes.

Fractional Vegetation Cover (FVC) is a key biophysical parameter for monitoring ecosystems, indicating their balance and resilience. The assessment of FVC is important for evaluating vegetation biomass and carbon stocks, pivotal components of ecosystem health. The precise mapping of FVC across various scales involves three key cover types: photosynthetic vegetation (PV), representing ground covered by green leaves facilitating photosynthesis; non-photosynthetic vegetation (NPV), encompassing branches, woody stems, standing litter, and dry leaves with reduced or no chlorophyll content; and bare earth (BE), representing the uncovered ground surface without vegetation. FVC offers a quantitative measure of the relative contribution of each cover type to the total ground surface, aiding in characterising vegetation composition.

Efficient and accurate remote sensing techniques are essential to complement conventional field-based methods for performing FVC measurements.  Drone remote sensing technologies provide opportunities to capture fine-scale spatial variability in vegetation, enabling the derivation of ecological (e.g., FVC), biophysical (e.g., aboveground biomass), and biochemical variables (e.g., leaf chlorophyll content). Local calibration and validation of drone products enhance upscaling to coarser spatial scales defined by satellite observations, improving the understanding of vegetation dynamics at the national scale for subsequent change detection analyses.

The research project applies deep learning methods in remote sensing to enhance understanding of ecosystem composition, structure, and function features, with a specific focus on diverse terrestrial ecosystems in Australia. Leveraging drone technologies and advanced deep learning algorithms, the project develops automated workflows for systematic ecosystem health assessments, thereby making a significant contribution to the validation of satellite observations. The research framework emphasises the potential of Deep Learning methods in generating FVC products from RGB and multispectral imagery obtained through drone data. The conclusion highlights the benefits of integrating LiDAR data with Deep Learning approaches for analysing denser vegetation structure scenarios, offering a holistic approach for a comprehensive understanding of ecosystem health and dynamics. This approach provides valuable insights for environmental monitoring and management.

How to cite: Sotomayor, L., Kattenborn, T., Poux, F., Turner, D., and Lucieer, A.: Investigating Deep Learning Techniques to Estimate Fractional Vegetation Cover in the Australian Semi-arid Ecosystems combining Drone-based RGB imagery, multispectral Imagery and LiDAR data., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5766, https://doi.org/10.5194/egusphere-egu24-5766, 2024.

The region of Amazônia Legal in Brazil is in constant change due to deforestation and degradation of forest and conversion into arable land to use for farming or cattle ranching. It is important to monitor these changes with respect to global climate change and to aid political decision makers. These changes are best captured and analyzed through openly accessible satellite data, such as the products of ESA's Sentinel Missions. Land use and land cover (LULC) classification is often performed on remotely sensed data through supervised learning algorithms that rely on precise labels to produce accurate results. However, this kind of data is often not available and it is a time consuming task to create such data at the required accuracy level through image interpretation. This can be alleviated by using existing LULC maps from other sources such as the classification maps produced by the MapBiomas Brasil project used in our project. These maps are created using Landsat time series data and multiple machine and deep learning models to classify the whole of Brazil into five macro and multiple micro classes. This data has it's own bias and is not correct in all places or highly inaccurate, especially compared to data which is higher in spatial resolution -- as the aforementioned Sentinel data -- and reveals more detail in the land coverage. Thus, it is a critical step to investigate the noise in the label data. There are multiple approaches in the relevant literature to tackle learning with noisy labels, most of these approaches rely on robust loss functions or learned models to identify the noise. We present a novel approach where the satellite imagery is split pixel-wise in the prior given five macro class labels. For each class, a self-organizing map (SOM) is learned to cluster the data in the spectral domain and thus identify representative prototypes of each class. Each class is represented by the same number of prototypes, which overcomes the problem of imbalanced classes. The labels are then checked through neighborhood rules if they belong to their given class or are labeled as unsure or even switch classes otherwise. 
In our study, approx. 79.5% of pixels keep their given class, while the rest is reassigned or even discarded. To validate the approach, the results are compared to a manually created validation set and inspected visually for qualitative correctness. The MapBiomas LULC maps reach an overall accuracy of 62.6% in the created validation areas. After relabeling the data with the presented approach, the overall accuracy reached a score of 81.3%, showing a significant increase. This approach is independent of a specifically learned model and only leverages on the relationship between the training data and the given label data --- the Sentinel-2 imagery and the MapBiomas LULC map, respectively.

How to cite: Hell, M. and Brandmeier, M.: Detection of noise in supervised label data: a practical approach in the Amazonas region of Brazil using land use and land cover maps, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5791, https://doi.org/10.5194/egusphere-egu24-5791, 2024.

As fine-resolution remotely sensed data rapidly evolves, individual trees are increasingly becoming a prevailing unit of analysis in many scientific disciplines such as forestry, ecology, and urban planning. Fusion of airborne LiDAR and aerial photography is a promising means for improving the accuracy of individual tree mapping. However, local misalignments between these two datasets are frequently ignored. Solving this problem using traditional pixel-based image registration methods requires extensive computation and is extremely challenging on large scales. In our earlier research, we proposed an approach that involved determining the optimal offset vector for a local area and using it to rectify the spatial positions of all individual trees in that area. Although the approach is effective in addressing mismatch issues, it still exhibits large errors for some trees and is susceptible to changes in scale. Here, we propose an enhanced algorithm by constructing a data structure called a k-dimensional tree (also known as K-D Tree) to efficiently search for each tree’s unique offset vector and assigning the closest determined offset vector to candidate trees that lack corresponding counterparts in the reference data. The enhanced algorithm significantly improves the matching accuracy of individual trees, elevating it from 0.861 ± 0.152 to 0.911 ± 0.126 (p < 0.01, t-test). Moreover, it substantially reduces the computational time by approximately 70% and successfully overcomes limitations associated with scale changes. The example data, source code, and instructions for the enhanced algorithm are publicly available on GitHub*.

*https://github.com/XUYIRS/Individual_trees_matching

How to cite: Xu, Y., Wang, T., and Skidmore, A.: An enhanced algorithm for co-registering individual trees extracted from airborne LiDAR and aerial photographs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6725, https://doi.org/10.5194/egusphere-egu24-6725, 2024.

EGU24-7821 | Posters on site | ESSI4.2

AI-aided Assessment of Mass Movement Potentials Along the Coast of Mecklenburg-Western Pomerania – Project Introduction and Outlook 

Jewgenij Torizin, Nick Schüßler, Michael Fuchs, Dirk Kuhn, Dirk Balzer, Kai Hahne, Steffen Prüfer, Claudia Gunkel, Karsten Schütze, and Lars Tiepolt

Coastal areas are dynamic zones where geological, marine, and atmospheric processes interact. The coastal shapes constantly evolve due to both natural factors and human activity. Gravitational mass movements, commonly called landslides, are prominent indicators of coastal dynamics. With the current climate projections indicating increasing stormy weather and extreme water levels, coastal communities face an escalating hazard of more frequent and severe landslides on steep coastlines. Mecklenburg-Western Pomerania exhibits a cliff coast of approximately 140 km, which is assessed to be actively receding in most parts.

The project, titled “AI-aided Assessment of Mass Movement Potentials Along the Coast of Mecklenburg-Western Pomerania,” focuses on developing advanced methods for quantitatively evaluating the hazard potential of mass movements in these ever-changing environments. This approach should enhance the efficiency and effectiveness of hazard assessment routines. The project covers five small study areas exhibiting different cliff types composed of chalk, glacial till sediments, and sand.

The exposition of the complex geological conditions through the coastal retreat may change. Therefore, one of the most significant challenges is the accurate mapping of current geological conditions controlling, among other factors, the occurrence of landslides. In some parts, the average coastal retreat is about 0.5 m annually. At the same time, detailed geological mappings conducted years or even decades ago do not adequately represent the current geological conditions that could be fed into models to conduct a landslide susceptibility assessment since some mapped features no longer exist.

Because traditional detailed field mapping by experts is time-consuming and costly, we seek options to enhance the mapping by employing uncrewed aerial vehicles (UAVs) equipped with multispectral sensors. These UAVs, through repetitive surveying missions, gather detailed data that enable precise change detection in photogrammetric point clouds. This data is essential for accurately calculating coastal retreat, mass balancing, and structural analysis. Employed AI algorithms interpret the UAV imagery, performing semantic segmentation to classify the surface into meaningful categories for further modeling. Given the need for extensive labeled datasets to train AI algorithms, we also explore data augmentation strategies. These strategies aim to generate extensive artificial datasets based on real-world data, which are crucial for effectively training the desired models.

Overall, we try to design a workflow to streamline the analysis steps, starting with UAV flight campaigns and classical photogrammetric processing paired with AI components to derive geological information. The derived parameters provide input in data-driven landslide susceptibility models. Furthermore, the generated spatio-temporal time series should be used for pre-failure pattern analysis with advanced AI for the long-term outlook.

How to cite: Torizin, J., Schüßler, N., Fuchs, M., Kuhn, D., Balzer, D., Hahne, K., Prüfer, S., Gunkel, C., Schütze, K., and Tiepolt, L.: AI-aided Assessment of Mass Movement Potentials Along the Coast of Mecklenburg-Western Pomerania – Project Introduction and Outlook, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7821, https://doi.org/10.5194/egusphere-egu24-7821, 2024.

EGU24-8334 | ECS | Orals | ESSI4.2

Rapid geological mapping based on UAV imagery and deep learning texture classification and segmentation 

Nick Schüßler, Jewgenij Torizin, Michael Fuchs, Dirk Kuhn, Dirk Balzer, Claudia Gunkel, Steffen Prüfer, Kai Hahne, and Karsten Schütze

Geological mapping in dynamic coastal areas is crucial, but traditional methods are laborious and expensive. Employing uncrewed aerial vehicle (UAV) - based mapping for high-resolution imagery, combined with deep learning for texture classification and segmentation, offers a promising improvement.

In the “AI-aided Assessment of Mass Movement Potentials Along the Steep Coast of Mecklenburg Western Pomerania” project, we explore the use of deep learning for geological mapping. We conduct repetitive UAV surveys across five distinct coastal areas, documenting various cliff types under different lighting and seasonal conditions. The imagery yields texture patterns for categories such as vegetation, chalk, glacial till, sand, water and cobble.

We apply two strategies: classification and semantic segmentation. Classification predicts one label per texture patch, while semantic segmentation labels each pixel. Classification requires distinct files with pre-labeled textures, whereas segmentation needs a training dataset with label masks, assigning class values to each texture pixel.

We employ Convolutional Neural Networks (CNN) for classification tasks, designing custom nets with convolutional blocks and attention layers, and testing existing architectures like ResNet50. We evaluate classification performance using accuracy measures and run sensitivity analysis to identify the smallest effective patch size for texture recognition. The effective patch size determines the final mapped class resolution. Classification is less detailed than segmentation but potentially more generalizable.

For semantic segmentation, we employ UNet architectures with encoder-decoder structures and attention gates for improved image context interpretation. We evaluate segmentation using the intersect over union index (IoU). Due to the need for extensive, accurate training data, we employ data augmentation to create artificial datasets blending real-world textures inspired by the Prague texture dataset.

Classification results show about 95% accuracy across target classes using RGB image input. Notably, the pre-trained ResNet50 exhibits moderate performance in texture recognition and is outperformed by simpler net designs trained from scratch. However, it shows adequate performance when pre-trained weights are neglected. For overall classification improvement, we anticipate that adding a Near Infrared band (NIR) will enhance classification, particularly for vegetation and glacial till, which are currently prone to misclassification.

Semantic segmentation yields IoUs of around 0.94 on artificial datasets. However, when applied to real-world imagery, the models show a noisy performance, yielding significant misclassifications. Thus, better generalization requires further fine-tuning and possibly integrating real-world data along with artificial datasets. Also, further experiments with data augmentation by extending the dataset and introducing different complexity levels could provide better generalization to real-world data.

In summary, combining UAV mapping with AI techniques holds significant potential for improving geological mapping efficiency and accuracy in dynamic coastal environments, providing reliable parametrization for data-driven models that require up-to-date geological information in high resolution.

How to cite: Schüßler, N., Torizin, J., Fuchs, M., Kuhn, D., Balzer, D., Gunkel, C., Prüfer, S., Hahne, K., and Schütze, K.: Rapid geological mapping based on UAV imagery and deep learning texture classification and segmentation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8334, https://doi.org/10.5194/egusphere-egu24-8334, 2024.

ISO's Technical Committee 211's Working Group 6 (WG6) standardizes geographic information, focusing on imagery, gridded data, and coverage data, along with their associated metadata. With emphasis on remote sensing and earth observation, WG6 provides standards for geopositioning, calibration, and validation. These combined efforts are foundational in creating structured, multidimensional data for use in data cubes and other gridded data endpoints. Upstream structured grid data is foundational, providing consistency for downstream AI analytics. WG6's standards foster interoperability for use in diverse systems, enabling machines to process and interpret data over spatial, temporal, and spectral dimensions. Such work is critical in advancing open standards for interoperable, multi-dimensional analysis-ready data, for future geospatial and Earth observation data analysis. We will present some of the fundamental standards that exist or are in creation to support multi-dimensional analysis-ready data.

How to cite: Wilkes, G., Haq, A., and Khokhriakova, A.: Advancing Geospatial and Earth Observation Data Analysis: The Role of ISO's Technical Committee 211 in Standardizing Imagery and Gridded Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10351, https://doi.org/10.5194/egusphere-egu24-10351, 2024.

As the volume of satellite observation data experiences exponential growth, our ability to process this data and extract meaningful insights is struggling to keep pace. This challenge is particularly pronounced when dealing with dynamic and variable phenomena across diverse spatiotemporal scales. Achieving accurate representation of these nuances necessitates data generation at high spatial and temporal resolutions, resulting in significant redundancy in computation and storage.

This issue is notably evident in the case of products that monitor plant phenology over time, which are crucial for assessing the impacts of climate change and monitoring agriculture. Computational complexities often limit these products to coarse resolutions (500m-1km) or short time frames, distorting our understanding of phenology across scales. In contrast, various approaches in hydrology and land surface modeling have utilized tiled grids and meshes to capture spatial heterogeneity and reduce dimensionality for complex modeling. This is accomplished through decomposing or aggregating modeling surfaces into response units representative of system drivers and have been shown to enable improved computational capabilities while still maintaining accurate approximations. We believe that similar modeling techniques can be leveraged to enable phenological modeling at higher resolutions. 

Building on these advancements, we develop a variable resolution scheme to represent land surface heterogeneity for modeling Land Surface Phenology (LSP) and decompose Landsat and Sentinel-2 Enhanced Vegetation Index (EVI) into adaptive areal units. Through this method we operationalize the Bayesian Land Surface Phenology (BLSP) model, a hierarchical Bayesian algorithm capable of constructing LSP data for the complete Landsat archive. While BLSP produces highly valuable results, it faces computational challenges for large-scale applications as its current time series approach necessitates each pixel to be computed individually. Our innovative approach reduces the dimensionality of modeling LSP by an order of magnitude to improve computational efficiency and enable the production of a 30 m BLSP product. These improvements are key to provide a region wide long-term phenometrics product at 30m resolution necessary to support studies into the long-term changes at a fine scale.



How to cite: Smith, O., Gao, X., and Gray, J.: Overcoming Big Data Challenges in Satellite Observation: A Variable Resolution Scheme for Modeling Land Surface Phenology, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12119, https://doi.org/10.5194/egusphere-egu24-12119, 2024.

Fractional Vegetation Cover (FVC) is an important vegetation structure factor for agriculture, forestry, and ecology. Due to its simplicity and reasonable precision, the vegetation index-based (VI-based) mixture model is commonly used to estimate vegetation cover from remotely sensed data. Improving the accuracy and computational efficiency of FVC estimations requires rapidly and precisely calculating the model's two most important parameters, namely the pure vegetation index of fully-vegetated and bare soil pixels. However, no pure normalized difference vegetation index (NDVI) values mapping has yet been produced. When there is a lack of pure pixels in many ecosystems, traditional empirical statistical approaches for obtaining pure vegetation index values are unreliable and challenging. In this study, the pure NDVI values mapping over China is achieved by combining the traditional empirical statistical method and the multi-angle remotely sensed inversion method (MultiVI), which can be adapted to various application scenarios for vegetation cover estimation when utilized with vegetation indices with different spatial and temporal resolutions. When the pure NDVI values extracted from a total of 19 GF-2 images in various parts of China were compared to those obtained in this study, the findings showed a good degree of accuracy. Furthermore, in semi-arid areas where fully-vegetated pixels are lacking and vegetation evergreen areas where bare soil pixels are lacking, this study can compensate for the fact that empirical statistical methods are unable to obtain accurate pure NDVI values and provide reasonable endmember NDVI values for vegetation cover estimation using the VI-based mixture model.

How to cite: Zhao, T., Song, W., and Mu, X.: Mapping pure vegetation index values based on multisource remote sensing data over China for estimation of fractional vegetation cover, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14124, https://doi.org/10.5194/egusphere-egu24-14124, 2024.

EGU24-14941 | Posters on site | ESSI4.2

Advancing Earth Monitoring: China's hyperspectral Operational Satellite Constellation 

Chenchao Xiao, Hongzhao Tang, and Kun Shang

China's first hyperspectral operational satellite constellation, launched in 2023, has significantly enhanced comprehensive Earth observation capabilities by integrating extensive quantitative data from space and ground sources. The constellation comprises the GF5-02, GF5-01A and ZY1-02D and ZY1-02E satellites. Operating in a sun-synchronous orbit, these satellites constitute a medium-resolution Earth observation system. Each satellite, GF5-02, GF5-01A, ZY1-02D and ZY1-02E, is equipped with visible and near-infrared as well as hyperspectral imager, enabling them to perform wide swath observations and acquire intricate spectral data. Significantly, ZY1-02E has been additionally equipped with a thermal infrared camera, thereby broadening its detection scope. The satellite team, collaborating with specialists across various fields, conducted 32 business tests in areas like land resources, geology, mapping, and marine monitoring, adhering to standards for natural resources survey and monitoring. After a year of operation, the constellation has shown robust functionality, stability, and data quality, meeting requirements for diverse applications such as resource enforcement, geological surveys, ecological restoration, geospatial updates, coastal surveillance, and industrial capacity reduction. The success in quantitative application tests of hyperspectral and thermal infrared payloads demonstrates the satellite's potential in providing critical insights for global users in the hyperspectral domain.

How to cite: Xiao, C., Tang, H., and Shang, K.: Advancing Earth Monitoring: China's hyperspectral Operational Satellite Constellation, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14941, https://doi.org/10.5194/egusphere-egu24-14941, 2024.

Water quality is crucial for human health and the sustainable development of the ecological environment. Traditional water quality monitoring methods rely on discrete in-situ measurements, limiting our understanding of water quality variations at large temporal and spatial scales. While remote sensing technology offers efficient water quality observation, it is mostly confined to monitoring optical active substances, making it challenging to assess water quality changes caused by chemical indicators. This study proposes a concept that changes in water quality status caused by chemical indicators within a certain range are responsive in terms of water-leaving reflection. To validate this hypothesis, water quality index (WQI) was initially calculated using data from water quality monitoring stations, resulting in water quality status data (ranging from excellent to severe pollution). Following this, an information extraction and classification inversion approach was proposed to establish a connection between ZY1-02D hyperspectral imagery and water quality status, leading to the development of a robust water quality status identification model. Validation results showed an average model accuracy of up to 82%, confirming the hypothesis of this study. Subsequently, this model was used to assess the water quality status of 180 large lakes and reservoirs (hereafter referred to as lakes) within China from 2019 to 2023 for the first time. The results indicated that 76.1% of the lakes exhibited excellent to good water quality conditions, with a spatial distribution pattern showing a "better in the west, worse in the east" trend. Over the 4-year period, 33.33% of the lakes showed improvement, while 50% remained stable, with the western and eastern regions primarily exhibiting stability and improvement, respectively. The long-term changes in water quality status are influenced by various interacting factors, with different patterns of influence existing in different time periods and regions. In the early years, natural factors (average elevation) played a dominant role. However, over time, the impact of meteorological factors (precipitation and wind speed) and anthropogenic factors (gross domestic product) gradually increased. These influences can be attributed to significant climate changes and effective management measures over the past two decades. The findings support rapid assessment of environmental conditions and sustainable resource management, highlighting the potential of remote sensing technology in water quality monitoring.

How to cite: Tang, H., Xiao, C., Shang, K., and Wu, T.: A new method for remote sensing assessment of water quality status based on ZY1-02D hyperspectral imagery—A case study of large lakes and reservoirs in China, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15052, https://doi.org/10.5194/egusphere-egu24-15052, 2024.

EGU24-18868 | Posters on site | ESSI4.2

A comparison of point and bounding box annotation methods to detect wild animals using remote sensing and deep learning 

Zeyu Xu, Tiejun Wang, Andrew Skimore, and Richard Lampery

The point and bounding box are the two widely used annotation techniques for deep learning-based wild animal detection using remote sensing. However, the impact of these two annotation methods on the performance of deep learning is still unknown. Here, using a publicly available Aerial Elephant Dataset, we evaluate the effect of two annotation methods on model accuracy in two commonly used neural networks (YOLO and U-Net). The results show that when using YOLO, there are no statistically significant differences between the point and bounding box-based annotation methods, as indicated by an overall F1-score being 82.7% and 82.8% (df = 4, P = 0.683, t-test), respectively. While when using U-Net, the accuracy of the results based on bounding boxes with an overall F1-score of 82.7% is significantly higher than that of the point-based annotation method with an overall F1-score of 80.0% (df = 4, P < 0.001, t-test). Our study demonstrates that the effectiveness of the two annotation methods is influenced by the choice of deep learning models. The result suggests that the deep learning method should be taken into account when deciding on annotation techniques for animal detection in remote sensing images.

How to cite: Xu, Z., Wang, T., Skimore, A., and Lampery, R.: A comparison of point and bounding box annotation methods to detect wild animals using remote sensing and deep learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18868, https://doi.org/10.5194/egusphere-egu24-18868, 2024.

EGU24-22091 | ECS | Orals | ESSI4.2

Automatic analysis of shoreline dynamics on Sentinel-2 datasets using CoastSat software toolkit 

Andrei Toma, Ionut Sandric, Bogdan Mihai, and Albert Scrieciu

Shorelines, as interfaces between land and water, are subject to continuous transformation due to various natural phenomena and human-induced activities. Natural processes, such as erosion and sedimentation play an important role in shaping coastal areas, while human activities, like urban expansion, also exert a significant stress on coastal areas. An illustrative instance of human-induced changes is exemplified by the Mamaia beach enlargement project, which was initiated along the coast of Romania at the Black Sea by the end of 2020 and executed throughout 2021. The analysis of this coastal transformation started in 2020, preceding the actual implementation of beach enlargement, and extended until late 2023. This timeframe was selected to capture the entirety of the dynamic changes that can be observed in the study region. Utilizing the advanced multi-temporal CoastSat toolkit, the analysis involved a detailed examination of 130 high-resolution images acquired by Copernicus Sentinel-2 satellites. Implemented within a Jupyter notebook environment using Python, CoastSat showcased its efficacy in extracting shorelines from the multi-temporal dataset, enabling a thorough understanding of the coastal dynamics observed in the Mamia beach enlargement project. The analysis reveals an expansion of over 200 m on the southern part of Mamaia beach. This transformation underscores the significant impact of human activities, emphasizing the need for sustainable coastal management practices in the face of evolving environmental challenges.

How to cite: Toma, A., Sandric, I., Mihai, B., and Scrieciu, A.: Automatic analysis of shoreline dynamics on Sentinel-2 datasets using CoastSat software toolkit, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22091, https://doi.org/10.5194/egusphere-egu24-22091, 2024.

EGU24-22442 | Orals | ESSI4.2

Digital Terrain Model retrieval within a Coastal Dune Systems by integrating Unmanned Aerial Vehicles’ optical and LiDAR sensors by using a FOSS workflow 

Alfonso Valerio Ragazzo, Alessandro Mei, Valeria Tomaselli, Francesca Carruggio, Andrea Berton, Giuliano Fontinovo, Fabio Michele Rana, and Maria Adamo

Biodiversity supervising through remote sensing assumes crucial importance in monitoring ecosystem integrity and resilience. This study features the integration of optical and LiDAR data coming from Unmanned Aerial Vehicles (UAV) for Digital Elevation Models (DEM) retrieval of the Lesina (Puglia, Italy) Coastal Dune Systems (CDS), aiming to support ecosystem monitoring for the habitat type 2250* "Coastal dunes with Juniperus spp”. This work aims to provide a Free and Open-Source Software (FOSS) workflow able to extract and calculate Digital Surface Model (DSM), Digital Terrain Model (DTM), and Digital Difference Model (DDM) through LiDAR and optical data in a very dense vegetation environment. By using the contribution of RStudio, Cloud Compare, and Quantum GIS software, it was possible to develop a useful methodology for DDM extraction, to compute Juniperus spp. architecture (areas and volumes), which can reflect habitat reduction and fragmentation when compared at different timescales. According to this, a point cloud integration from the two datasets (optical and LiDAR) was provided. Consequently, the generation of an orthophoto and a DSM occurred, needed for the extraction of a vegetation mask using spectral indices (e.g., Excess Green) and for the choice of a pixel threshold, both able to isolate as much as possible the contribution of the vegetation along the DSM. Using scripts in RStudio it was possible to simplify and speed up the processing procedure, inserting additional useful codes for a further isolation of the vegetation matrix from the terrain. Consequently, areas belonging to the presence of vegetation took “NoData” values. So, to fill these areas with significant elevation values of their surroundings, a linear interpolation technique was used by using Inverse Distance Weight (IDW) interpolator, obtaining a “raw” DTM populated by fewer signal noise due to wind disturbance and shading. Following this, the subsequent processing involves the elimination of persistent noise from the point cloud extracted from the “raw” DTM. By using the segmentation tool in Cloud Compare software, not-inherent cloud’s points were removed, allowing to eliminate altimetric errors in the elevation model. Following this operation, a final DTM was extracted from the point cloud representing more accurately the altimetry of the terrain in the study area. Finally, to obtain the height of the canopies, through the expression “DSM-DTM=DDM” used in the QGIS Raster Calculator, the DDM was obtained. The canopies have been considered as 2.5D geometries, resulting in heights representing only the contribution of the above ground biomass. Finally, vegetation areas and volumes were designed from the DDM computation, where canopies’ total volume calculation occurred by adding all the results obtained for each pixel of interest. Hence, this methodology allowed us to monitor biomass parameters (areas and volumes), by a FOSS methodology, in a CDS context with very dense Juniperus spp. vegetation.

How to cite: Ragazzo, A. V., Mei, A., Tomaselli, V., Carruggio, F., Berton, A., Fontinovo, G., Rana, F. M., and Adamo, M.: Digital Terrain Model retrieval within a Coastal Dune Systems by integrating Unmanned Aerial Vehicles’ optical and LiDAR sensors by using a FOSS workflow, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22442, https://doi.org/10.5194/egusphere-egu24-22442, 2024.

EGU24-1634 | ECS | PICO | ESSI4.3

Uncertainties in deforestation emission baseline methodologies and implications for carbon markets 

Hoong Chen Teo, Nicole Hui Li Tan, Qiming Zheng, Annabel Jia Yi Lim, Rachakonda Sreekar, Xiao Chen, Yuchuan Zhou, Tasya Vadya Sarira, Jose Don T. De Alban, Hao Tang, Daniel A. Friess, and Lian Pin Koh

Carbon credits generated through jurisdictional-scale avoided deforestation projects require accurate estimates of deforestation emission baselines, but there are serious challenges to their robustness. We assessed the variability, accuracy, and uncertainty of baselining methods by applying sensitivity and variable importance analysis on a range of typically-used methods and parameters for 2,794 jurisdictions worldwide. The median jurisdiction’s deforestation emission baseline varied by 171% (90% range: 87%-440%) of its mean, with a median forecast error of 0.778 times (90% range: 0.548-3.56) the actual deforestation rate. Moreover, variable importance analysis emphasised the strong influence of the deforestation projection approach. For the median jurisdiction, 68.0% of possible methods (90% range: 61.1%-85.6%) exceeded 15% uncertainty. Tropical and polar biomes exhibited larger uncertainties in carbon estimations. The use of sensitivity analyses, multi-model, and multi-source ensemble approaches could reduce variabilities and biases. These findings provide a roadmap for improving baseline estimations to enhance carbon market integrity and trust.

How to cite: Teo, H. C., Tan, N. H. L., Zheng, Q., Lim, A. J. Y., Sreekar, R., Chen, X., Zhou, Y., Sarira, T. V., De Alban, J. D. T., Tang, H., Friess, D. A., and Koh, L. P.: Uncertainties in deforestation emission baseline methodologies and implications for carbon markets, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1634, https://doi.org/10.5194/egusphere-egu24-1634, 2024.

EGU24-2657 | ECS | PICO | ESSI4.3

Patterns of disturbance dynamics within the Cerrado-Amazon Transition using time series data and Residual Neural Networks 

Chuanze Li, Angela Harris, Polyanna da Conceição Bispo, and Matthew Dennis

The Cerrado-Amazon Transition (CAT) is the world’s largest tropical ecotone and separates the Cerrado Savannah from the Amazon Rainforest. Deforestation and degradation of large swathes of the dense Amazon rainforest and Brazilian Savanna is leading to irreversible transformation and a critical loss of biodiversity. An increase in wildfire and agriculture-led deforestation makes the CAT a dynamic ecological border within the internationally known ‘Arc of Deforestation’. Yet, our understanding of the impacts of deforestation and degradation in the CAT is hampered by a lack of knowledge as to where and when these disturbances occur. Here we combine time-series segmentation and deep learning algorithms to identify and quantify disturbances in the CAT over a 35 - year period. Using a combination of the Landtrendr algorithm, Landsat time series data and a Residual Neural Network (ResNet), we identified four different forest disturbance types (forest clearance, savannah clearance, forest wildfire, savannah wildfire) occurring within the CAT, based on their temporal spectral trajectories. Using our approach, we identified  more than 384,000 km2 of disturbance between 1985 and 2020, with forest clearance accounting for the most significant proportion (35%) of the identified change. The accuracy of disturbance detection ranged from 88to 93%, while the accuracy of disturbance type classification reached ~ 95%, although disturbance events occurring within savannas are more difficult to identify, often due to lower initial vegetation cover. The greatest period of disturbance occurred between 1995-1998, due to increased agricultural activity.

How to cite: Li, C., Harris, A., da Conceição Bispo, P., and Dennis, M.: Patterns of disturbance dynamics within the Cerrado-Amazon Transition using time series data and Residual Neural Networks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2657, https://doi.org/10.5194/egusphere-egu24-2657, 2024.

EGU24-3447 | ECS | PICO | ESSI4.3

Flood susceptibility mapping using neural network based models in Morocco: Case of Souss Watershed 

Mohamed Aghenda, Adnane Labbaci, Mohamed Hssaisoune, and Lhoussaine Bouchaou

<p>The global climate situation becomes more and more critical due to the impacts of climate change especially when dealing with flood hazard causing major human and economic losses every year. In Morocco, the Souss-watershed is one of the most vulnerable regions in term of flooding and land degradation. The climate conditions, population growth affect more the land use conditions. The present work introduces a novel approach to assess flood risk in Souss watershed using 4 neural network based models in Google Colab: Artificial neural networks (ANN), Recurrent Neural networks (RNN), One dimensional (1DCNN) and Two dimensional Convolutional neural networks (2DCNN). The models input were constructed using 4 features chosen from 17 of the most triggering flood factors that describe the characteristics of the watershed, including topography, vegetation and soil ones. The Pearsons correlation factor was applied to evaluate the correlation between the features, the variance inflation factor analysis (VIF) was applied to diagnose the collinearity and the Shapley Additive Explanations (SHAP) was applied to evaluate the importance of a factor in the prediction model. For the evaluation and validation process, the calculation of the Mean Absolute Error (MAE) and loss was used to evaluate the accuracy of predictions along with the calculation of the ROC (Receiver Operating Characteristic) and the AUC (Area Under the ROC Curve) to compare between the four models, the results demonstrated that the RNN has the highest performance with an accuracy of 96% and a validation loss of 0.0984 and a validation MAE of 0.2553, followed by the ANN with a slightly lower accuracy of 95% , 2DCNN and 1DCNN demonstrated lower accuracies of 87% and 81%. These findings have demonstrated that in the flood susceptibility mapping context, the application of complex neural networks such as 1DCNN and 2DCNN calls for more tuning and optimizing to overcome over-fitting issues, and that using simple neural networks such as RNN and ANN can be more effective in achieving more accurate predictions.</p>

How to cite: Aghenda, M., Labbaci, A., Hssaisoune, M., and Bouchaou, L.: Flood susceptibility mapping using neural network based models in Morocco: Case of Souss Watershed, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3447, https://doi.org/10.5194/egusphere-egu24-3447, 2024.

EGU24-4541 | PICO | ESSI4.3

Time-series analysis on multi-modal data for precisionviticulture to support adaptation to changing climate patternsin Franconia 

Melanie Brandmeier, Daniel Heßdörfer, Adrian Meyer-Spelbrink, Philipp Siebenlist, and Anja Kraus

The cultivation of vine is an important economic sector in agriculture as well as a cultural legacy
in many regions. Lower Franconia is one of Germany’s largest wine-producing areas with more
than 6,000 ha of vineyards and a production of 450,000 hectolitres of wine. In the context of climate
change, strategies to adapt to changing precipitation and temperature patterns and to mitigate risks
from drought and grape diseases is of utter importance for sustainable viticulture. Water deficit
produces diverse effects, such as reduced berry size or the failure of fruit maturation, depending on the
plant’s growth stage [1]. Severe water stress triggers partial or complete stomatal closure, resulting in a
reduction of photosynthetic activity [3]. Thus, ideal soil moisture is key to sustainable viticulture and
monitoring plant development, soil moisture and climate variables is crucial for precision viticulture
[2]. Due to the typical trellis systems, satellite remote sensing using deca-meter resolutions (such
as Landsat or Sentinel-2 series) is not well-suited for plant monitoring as pixel information at this
resolution consists of vines and ground information of the space between plant rows that might be
grass, other plants or soil. Thus, we investigate time-series of very-high-resolution multispectral
data derived from a UAV-based system, hyperspectral data from in-situ measurements as well as
sensor data for soil moisture and evaluate results with respect to different irrigation patterns. Such
multi-sensor and multi-temporal approaches contribute to a better understanding of the vineyard
as a dynamic system and, thus, lead to better monitoring and management options and allow to
build a GIS-based Digital Twin of the vineyard. We will present first results from daily to weekly
measurements, evaluate different vegetation indices and highlight temporal patterns and reaction
times between soil moisture data and spectral measurements.


[1] WJ Hardie and JA Considine. “Response of grapes to water-deficit stress in particular stages of
development”. In: American Journal of Enology and Viticulture 27.2 (1976), pp. 55–61.
[2] Margareth A Oliver. “An overview of geostatistics and precision agriculture”. In: Geostatistical
applications for precision agriculture (2010), pp. 1–34.
[3] Maria Romero et al. “Vineyard water status estimation using multispectral imagery from an UAV
platform and machine learning algorithms for irrigation scheduling management”. In: Computers
and Electronics in Agriculture 147 (Apr. 2018), pp. 109–117. (Visited on 10/10/2023).



How to cite: Brandmeier, M., Heßdörfer, D., Meyer-Spelbrink, A., Siebenlist, P., and Kraus, A.: Time-series analysis on multi-modal data for precisionviticulture to support adaptation to changing climate patternsin Franconia, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4541, https://doi.org/10.5194/egusphere-egu24-4541, 2024.

Traditional imageery workflows in many cases still follow a Datset --> process --> result Dataset scheme.
In todays world of high spatial and temporal imagery resolution and analytical cadence, this turns out to
be increasingly inpractical. This is backed by todays cloud based mass data storage and constant change detection requirements.
So accessing Imagery as dynamic compilations and analyzing stacks, time-series, blocks upon demand, largely without producing
permanent internmeidate data has become a paradigm. Added to it is the demand to analyze the Imagery sources together with in-situ
sources, sensors, base geometries and other georelated data sources.

This presentations will outline the capabilities and strategies of the ArcGIS Platform to fulfill these demands and reference
cloud based data and processing as well as dynamic server analytics and traditional Desktop approaches - UI and Python/Notebook driven.
Time-series analysis, DeepLearning/AI and combined Raster/Vector analysis will be used as examples. 

Target audience is anyone interested in geospatial analysis combining Imagery and other geospatial data sources.

How to cite: Doerffel, G.: Image Management and Analytics integrated in geospatial workflows, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6098, https://doi.org/10.5194/egusphere-egu24-6098, 2024.

EGU24-10788 | ECS | PICO | ESSI4.3

Improved Mapping of Perennial Crop Types Based on Patterns of Intra-Annual Variation in Land Surface Components 

Xiang Gao, Qiyuan Hu, Fei Lun, and Danfeng Sun

Perennial crops hold significant importance in global agricultural markets, local agricultural economies, poverty reduction, and biodiversity conservation. Accurate spatial distribution data of different perennial crops are crucial for local agricultural management, crop yield prediction, and sustainable agricultural development. However, precisely obtaining the spatial distribution of different perennial crops using remote sensing data, especially in regions with complex cropping patterns, remains a challenge. This study, focusing on Yantai City, China, and three counties in California, USA, aims to develop a method suitable for both smallholder and intensive production systems. On the Google Earth Engine (GEE) platform, we applied linear spectral mixture analysis (LSMA) to transform Sentinel-2 time-series data (2020-2022) from the original spectral space into a unified endmember space, including photosynthetic vegetation, non-photosynthetic vegetation, soil, and shadow, thereby characterizing the time-series land surface component information. Subsequently, based on the time-series endmembers data, we quantified seven sets of harmonic features and five phenological features. Relative phases were employed as phase indicators for the harmonic features. These seven harmonic features represent the intra-annual patterns of land surface component changes and inter-component interactions, while the phenological features quantify the timing of phenological events. Our findings reveal significant variations in the intra-annual patterns of land surface component changes among different perennial crops, attributable to differences in phenology and phenology-associated human management. Building on this, the study achieved precise mapping of different perennial crops, even in areas with complex cropping patterns. The overall classification accuracy for perennial crops in Yantai City and the three counties in California was 90.3% and 94.8%, respectively, with Kappa coefficients of 89.2% and 93.9%. Utilizing intra-annual time-series land surface component information for extracting the spatial distribution of perennial crops demonstrated advantages over traditional optical indices. This work provides a method that is applicable to both smallholder and intensive production systems, enabling precise mapping of perennial crop types. It represents an important step towards achieving large-scale mapping of perennial crop types.

How to cite: Gao, X., Hu, Q., Lun, F., and Sun, D.: Improved Mapping of Perennial Crop Types Based on Patterns of Intra-Annual Variation in Land Surface Components, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10788, https://doi.org/10.5194/egusphere-egu24-10788, 2024.

EGU24-10831 | ECS | PICO | ESSI4.3

Sub-field detection of cereal yield losses and its causes using Sentinel-2 time series and weather data 

Keke Duan, Anton Vrieling, Michael Schlund, Uday Bhaskar Nidumolu, Christina Ratcliff, Simon Collings, and Andrew Nelson

Weather extremes severely affect agricultural production and threaten food security worldwide. Throughout the growing season, crops can experience various degrees of weather stress whereby multiple stressors could occur simultaneously or intermittently. For large spatial extents, it is difficult to estimate actual crop damage merely through field experiments or crop models. Remote sensing can help to detect crop damage and estimate lost yield due to weather extremes over large spatial extents, but current RS-based studies usually focus on a single stress or event. We propose a novel scalable method to predict in-season yield losses at the sub-field level and attribute these losses to different weather extremes. To assess our method’s potential, we conducted a proof-of-concept case study on winter cereal paddocks in South Australia using data from 2017 to 2022. To detect crop growth anomalies throughout the growing season, we aligned a two-band Enhanced Vegetation Index (EVI2) time series from Sentinel-2 with thermal time derived from gridded meteorological data. The deviation between the expected and observed EVI2 time series was defined as the Crop Damage Index (CDI). We assessed the performance of the CDI within specific phenological windows to predict yield loss. Finally, by comparing instances of substantial increase in CDI with different extreme weather indicators, we explored which (combinations of) extreme weather events were likely responsible for the experienced yield reduction. We found that the use of thermal time diminished the temporal deviation of EVI2 time series between years, resulting in the effective construction of typical stress-free crop growth curves. Thermal-time-based EVI2 time series resulted in better prediction of yield reduction than those based on calendar dates. Yield reduction could be predicted before grain-filling (approximately two months before harvest) with an R2 of 0.83 for wheat and 0.91 for barley. The combined analysis of CDI curves and extreme weather indices allowed for timely detection of weather-related causes of crop damage, which also captured the spatial variations of crop damage attribution at sub-field level. Our approach can help to improve early assessment of crop damage and understand weather causes of such damage, thus informing strategies for crop protection.

How to cite: Duan, K., Vrieling, A., Schlund, M., Nidumolu, U. B., Ratcliff, C., Collings, S., and Nelson, A.: Sub-field detection of cereal yield losses and its causes using Sentinel-2 time series and weather data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10831, https://doi.org/10.5194/egusphere-egu24-10831, 2024.

EGU24-13195 | ECS | PICO | ESSI4.3

Global pastures and grasslands productivity time series mapped at 30-m spatial resolution using Light Use Efficiency Model 

Leandro Parente, Julia Hackländer, Davide Consoli, Tomislav Hengl, Vinícius Mesquita, Laerte Ferreira, Lindsey Sloat, Nathalia Teles, Yu-Feng Ho, Ichsani Wheeler, and Fred Stolle

Pastures and grasslands are the largest land cover of Earth's surface, comprising fundamental landscapes for water and nutrient cycling, food production, biodiversity conservation and land management in the planet. Monitoring the conditions and productivity aspects of these lands can lead to major contributions for land degradation mitigating in line with sustainable development goals defined by the United Nations (UN) 2030 agenda. Nevertheless, an operational approach able to monitor productivity of pastures and grasslands at global scale and high spatial resolution (e.g. 30-m) is a challenging research problem. Aiming to contribute with this topic, the current work present a methodology to derive 30-m bi-monthly time-series of Gross Primary Productivity (GPP) for pastures and grasslands of the world based on GLAD Landsat ARD (collection-2) and a customized Light Use Efficiency Model (LUE). The Landsat imagery were aggregated by every two months and gapfilled by an temporal interpolation based on Fast Fourier Transform (FFT). The complete, consistent and gapfilled Landsat time-series was used to estimate the Fraction of Photosynthetically Active Radiation (FPAR) and Land Surface Water Index (LSWI), which combined with 1-km MODIS temperature (MOD11A2) and 1° CERES Photosynthetically Active Radiation (SYN1deg v4.1 - PAR) images resulted in 30-m global GPP time-series product from 2000 onwards. Our preliminary validation approach, based on FLUXNET2015 data, indicated a R2 of 0.67 and RMSE of 2.06 for in-situ stations located Europe. We are working to release the first version of the product as open data (CC-BY license) in the context of the Global Pasture Watch project and the World Resources Institute's Land & Carbon Lab, establishing partnerships with local organization and research institute to collect feedback and additional validation data to improve further versions of the product.

How to cite: Parente, L., Hackländer, J., Consoli, D., Hengl, T., Mesquita, V., Ferreira, L., Sloat, L., Teles, N., Ho, Y.-F., Wheeler, I., and Stolle, F.: Global pastures and grasslands productivity time series mapped at 30-m spatial resolution using Light Use Efficiency Model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13195, https://doi.org/10.5194/egusphere-egu24-13195, 2024.

EGU24-14221 | PICO | ESSI4.3

The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Project: Status, Access, Applications, Tools, and More 

Alexander L. Handwerger, Steven Chan, and David Bekaert and the OPERA Team

Remote sensing satellites provide key data that can be used to better understand the Earth, respond to hazardous events, and to make decisions related to climate change, population growth, and more. For decades, many space agencies have provided high quality remote sensing data free of charge for their end-users. Although these data have been accessible, and widely used, the raw remote sensing measurements can be challenging to analyze for non-specialists. Furthermore, the large quantity of data available today makes it nearly impossible to perform large scale analysis on personal computers. To overcome these barriers, the Observational Products for End-Users from Remote Sensing Analysis (OPERA) project, led by the Jet Propulsion Laboratory, with project partners from NASA, USGS, and academia, are producing three analysis ready data products derived from satellite-based synthetic aperture radar (SAR) and optical data. These products were designed to meet the needs of U.S. federal agencies that were identified by the Satellite Needs Working Group (an initiative of the U.S. Group on Earth Observations). The OPERA analysis ready data products are derived from the NASA/USGS Landsat 8/9 sensors, ESA’s Sentinel-1 and -2, and NASA-ISRO SAR Mission (NISAR). Specific products include: (1) a near-global Dynamic Surface Water eXtent (DSWx) product suite from optical and SAR data, (2) a near-global Surface Disturbance (DIST) product suite from optical and SAR data, and (3) a North America Displacement (DISP) product suite from SAR data. In addition, OPERA is producing intermediate-level products including: (1) a North America Land Coregistered Single-Look Complex (CSLC) product from SAR data, and (2) a near-global land surface Radiometric Terrain Corrected (RTC) SAR backscatter product from SAR data. These two intermediate SAR products allow for user-customized product generation. All OPERA products are freely available and all OPERA software are open-access (https://github.com/opera-adt). More information on OPERA can be found at https://www.jpl.nasa.gov/go/opera. 

Here, we present the latest OPERA project updates. We provide an overview of OPERA’s in-production products, which include DSWx, DIST, RTC, and CSLC, and information about OPERA’s future DISP product. We will showcase product use-cases, with a focus on detection and monitoring of hazards such as floods, wildfires, earthquakes, and landslides. We also discuss how the free and open OPERA data can be accessed through the NASA Distributed Active Archive Centers. Finally, we demonstrate our open OPERA application tools (https://github.com/OPERA-Cal-Val/OPERA_Applications) that are designed to increase data use and discoverability.

How to cite: Handwerger, A. L., Chan, S., and Bekaert, D. and the OPERA Team: The Observational Products for End-Users from Remote Sensing Analysis (OPERA) Project: Status, Access, Applications, Tools, and More, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14221, https://doi.org/10.5194/egusphere-egu24-14221, 2024.

This talk will discuss techniques used to map aboveground biomass in the region of Cantabria, Spain, and examine tree cover change over time. Using the biomass workflow outlined by Esri's Hong Xu we leverage machine learning tools in ArcGIS Pro, altimeter sensor data, optical satellite imagery, and DEM data to gain foundational insights into land cover and forest density. By combining trajectory point data from GEDI, which contains aboveground biomass information, with surface reflectance bands from Landsat collection 2, and DEM data we examine forest density, and coverage in the Cantabria region. These results are compared to data analyzed from previous decades using deep learning techniques to assess forest change in the region. Using ArcGIS Pro and AWS we can integrate remote sensing datasets from multiple satellite sensors and utilize machine learning tools to train and run a regression model providing us with an estimation of the above-ground biomass for the region of Cantabria, Spain in the form of a raster layer. This data can then be used in conjunction with results from tree cover analysis over time using deep learning and displayed in an interactive web application. This multifaceted analysis can provide researchers, policymakers, and stakeholders with key insights into progress and prioritization and aid in addressing local challenges related to forest health. 

How to cite: Graham, L., Alouta, R., Tanh, L., and Xu, H.: Examining aboveground biomass and tree cover change in the region of Cantabria, Spain using multiple satellite sensors and machine learning tools in ArcGIS Pro. , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19566, https://doi.org/10.5194/egusphere-egu24-19566, 2024.

EGU24-20017 | ECS | PICO | ESSI4.3

Investigating the impact of urban 3D variables on satellite land surface temperature estimates 

Renato Cifuentes La Mura, Laure Roupioz, Françoise Nerry, and Xavier Briottet

Land surface temperature (LST) is of fundamental importance to many aspects of geosciences, such as, net radiation budget, evaluation and monitoring of forest and crops. The LST is also a key-parameter to derive the Surface Urban Heat Island (SUHI) or health related indices, for example, the Discomfort Index (DI). It is therefore an essential prerequisite to understand the urban climate and to support the definition of mitigation strategies, health risk management plans, public policies among other initiatives to effectively address the adverse effects of heat. While the LST is commonly retrieved from data acquired in the TIR (Thermal InfraRed) spectral domain by remote multispectral sensors with about 1K accuracy for natural surfaces, its retrieval over urban areas is not trivial. Urban landscapes possess tremendous challenges in LST estimates due to its high heterogeneity of surfaces and materials, and the three-dimensional (3D) configuration of the elements that are present on urban areas.

The Thermal InfraRed Imaging Satellite for High-resolution Natural resource Assessment (TRISHNA) is planned for launch in 2025 and features a TIR instrument that will image the Earth every three days, at 57 m resolution, providing the research community with critical information to understand the radiative interactions and impacts al local level. This new satellite mission will offer an unprecedented opportunity to support urban microclimate studies. Based on extensive radiative transfer simulations using the Discrete Anisotropic Radiative Transfer Model (DART) and sensitivity analysis, this work investigates the impacts of 3D urban structures (e.g., road width, building height, building density) and materials with different optical properties on LST estimation at the TRISHNA spatial resolution. In fine, the idea is to develop a method to minimize these impacts on LST estimated from the TRISHNA data.

First, a processing chain has been set up to simulate TRISHNA LST with DART, by using as inputs i) the configuration of the sensor and ii) 3D urban forms with different geometric and optical properties. Second, the radiative transfer modeling for simulation of the TIR remote sensing signal is performed. Finally, by correlating the simulated TRISHNA LST and the surface characteristics for each scene, the main parameters impacting the LST in urban environments have been identified. From these results, a correction method at satellite scale to minimize the impacts of urban 3D variables on LST will be formulated.

How to cite: Cifuentes La Mura, R., Roupioz, L., Nerry, F., and Briottet, X.: Investigating the impact of urban 3D variables on satellite land surface temperature estimates, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20017, https://doi.org/10.5194/egusphere-egu24-20017, 2024.

EGU24-20629 | ECS | PICO | ESSI4.3

Unlocking time series analysis with data-fusion land surface temperature  

Jasper Dijkstra, Yoann Malbeteau, Maria Ghironi, Pierre Guillevic, and Richard de Jeu

Consistent and continuous Land Surface Temperature at high temporal resolution is essential for many applications, such as anomaly detection (e.g. agricultural droughts), urban heat island monitoring or irrigation and crop water stress, among others. LST can be retrieved at high spatial resolution from spaceborne thermal infrared (TIR) instruments, like MODIS/VIIRS, Landsat, ECOSTRESS, and ASTER. But these data come with large temporal gaps due to cloud cover and orbit/sensor characteristics and consequently complicate time series analyses. 

To overcome these limitations, we developed a daily 100m LST product based on the synergy between passive microwave brightness temperatures from the Advanced Microwave Scanning Radiometer 2 (AMSR2), and optical data from Sentinel 2 within a novel disaggregation method [1]. This results in a dataset that allows for monitoring environmental systems consistently and continuously in near-real time. The passive microwave observations offer a distinct advantage in LST estimation due to the ability to penetrate cloud cover and measure thermal emissions at the surface. On the other hand, Sentinel-2, with its high spatial resolution multispectral bands, provides rich information on land cover and land surface properties. By combining these complementary datasets, we aim to leverage the strengths of both sensors to improve the accuracy and spatial resolution of LST retrieval. The method uses the abundance of overlaps between passive microwave footprints in combination with higher spatial information from S2 NIR and SWIR for downscaling at 100m resolution since 2017 at 1:30am and at 1:30pm. 

To assess the accuracy of the 100m LST, we compared the time series of microwave-based LST at 100+ locations against in situ measurements, MODIS and Landsat LST data. Currently, the temporal accuracy compared to these in-situ stations is ±3.1K with a correlation of 0.91 (for MODIS this was 2.6 and 0.94). We also performed a spatial comparison of our 100m LST data over agricultural regions against Landsat LST. While the few clear-sky Landsat LST observations is a limitation for the comparison, the preliminary results show a spatial accuracy between ±1.5K and ±4K.

Our results demonstrate that our LST data-fusion approach is a viable methodology to generate a temporally and spatially high resolution LST archive. We are aiming to bridge current and future missions for high-resolution LST by harnessing the complementary capabilities of multi-sensor data fusion. The proposed framework holds great potential for improving our understanding and monitoring Earth's complex environmental systems, such as local surface energy dynamics, climate processes or supporting various environmental applications requiring accurate and high-resolution LST information.

[1] de Jeu, R. A. M., de Nijs, A. H. A., & van Klink, M. H. W. (2017). Method and system for improving the resolution of sensor data. https://patents.google.com/patent/WO2017216186A1/en 

How to cite: Dijkstra, J., Malbeteau, Y., Ghironi, M., Guillevic, P., and de Jeu, R.: Unlocking time series analysis with data-fusion land surface temperature , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20629, https://doi.org/10.5194/egusphere-egu24-20629, 2024.

EGU24-7931 | Orals | ESSI4.4 | Highlight

Advancing Open Data Portals: Learnings from the EPOS Open-Source Solution 

Valerio Vinciarelli, Rossana Paciello, Daniele Bailo, Claudio Goffi, Daniel Warren, Janusz Lavrnja-Czapski, Christopher Card, Philip Atkinson, Wayne Shelley, Jean-Baptiste Roquencourt, Yann Retout, Helen Glaves, Kety Giuliacci, Jan Michalek, Jakob Molander, Harald Nedrebø, Otto Lange, Carmela Freda, Kauzar Saleh-Contell, and Manuela Sbarra

The European Plate Observing System (EPOS), established as a European Research Infrastructure Consortium (ERIC) in 2018, stands as a significant milestone in pan-European research infrastructures, focusing on solid Earth science. The EPOS Data Portal, officially launched in April 2023, is the place where FAIR principles and practices are implemented thanks to the adoption of a co-development approach and  harmonization of actions across communities of scientists, developers, data providers, and users. The EPOS Data Portal currently provides access to data and products from 10 different disciplines: Seismology, Near-Fault Observatories, GNSS Data and Products, Volcano Observations, Satellite Data, Geomagnetic Observations, Anthropogenic Hazards, Geological Information and Modeling, Multi-Scale Laboratories, and Tsunami.

The EPOS Data Portal is based on a user-friendly user interface which provides intuitive visualization methods and interaction modes that significantly simplifies and facilitates the discovery and the access to the geoscientific community assets. Through the portal, users can: i) Perform data searches by combining a set of criteria; ii) Navigate and visualize the retrieved search results in different ways; iii) Fine-tune results using facets and advanced filters; iv) Download selected results or store them in a favorites list.

The underlying system of the Data Portal has been crafted using a blend of open-source technologies, including Java, RabbitMQ, Python, and others. We implemented a modular architecture based on the microservices paradigm, facilitating seamless integration of new data and services through dedicated software interfaces. The source code, collaboratively developed by scientists and IT experts, is now available under a GPL license (https://epos-eu.github.io/epos-open-source/) along with a comprehensive developer’s guide.

 

In this contribution, we demonstrate the potential impact of our open-source solution in advancing visualizations, interfaces, and best practices within the context of multidisciplinary research. Furthermore, we present how other research infrastructures, projects and initiatives can benefit from the shared knowledge and expertise, accelerating the development of robust and advanced Earth science data portals.

How to cite: Vinciarelli, V., Paciello, R., Bailo, D., Goffi, C., Warren, D., Lavrnja-Czapski, J., Card, C., Atkinson, P., Shelley, W., Roquencourt, J.-B., Retout, Y., Glaves, H., Giuliacci, K., Michalek, J., Molander, J., Nedrebø, H., Lange, O., Freda, C., Saleh-Contell, K., and Sbarra, M.: Advancing Open Data Portals: Learnings from the EPOS Open-Source Solution, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7931, https://doi.org/10.5194/egusphere-egu24-7931, 2024.

EGU24-11327 | ECS | Orals | ESSI4.4

EOmaps: An open-source python package for geographic data visualization and analysis. 

Raphael Quast and Wolfgang Wagner

EOmaps is a free and open-source python package specifically tailored for geographic data visualization and analysis.

The main goals of the package are twofold:

  • Speed up and simplify the daily struggle of geographic data visualization
  • Directly use the figures as fully customisable interactive data-analysis widgets

EOmaps is built on top of matplotlib and cartopy and integrates well with the scientific python infrastructure (numpy, pandas, xarray, geopandas, datashader, etc.). It provides a flexible and well-documented API to create publication-ready figures and it can be used to visualize (potentially large) structured (e.g. raster) or unstructured (e.g. unordered lists) datasets provided in arbitrary projections. 

In addition, EOmaps comes with many useful features to help with scientific geo-data analysis:

  • Maps can have multiple layers to interactively compare and (transparently) overlay datasets, web-maps etc.
  • Once a dataset is plotted, you can assign arbitrary callback functions to interactively run your analysis-workflow on selected datapoints (e.g. load data from a database, plot underlying timeseries, histograms etc.)

Figures created with EOmaps can be exported as images (png, jpeg, ...), vector-graphics (svg) or embedded in Jupyter Notebooks, web-pages (html) or in GUI frameworks such as Qt or tkinter.

In this presentation we will highlight the capabilities of EOmaps and show how it can be used in a variety of different situations to aid your scientific data analysis workflow.

EOmaps source-code: https://github.com/raphaelquast/EOmaps  
EOmaps documentation: https://eomaps.readthedocs.io/

How to cite: Quast, R. and Wagner, W.: EOmaps: An open-source python package for geographic data visualization and analysis., EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11327, https://doi.org/10.5194/egusphere-egu24-11327, 2024.

EGU24-11805 | Posters on site | ESSI4.4

Next-Gen Zarr Web Map Visualization 

Aimee Barciauskas, Max Jones, Kata Martin, Sean Harkins, and Vincent Sarago

Visualization of Earth science data is crucial for its exploration and understanding. Web browsers, as a universal platform, face the challenge of rendering complex geospatial data swiftly. This led to the creation of pre-generated static map tiles, allowing quick visualization but limiting user control over data representation and imposing storage and update burdens on providers.

While pregenerated map tiles make it possible to visualize data quickly, there are drawbacks. The most significant is the data provider chooses how the data will appear. The user has no power to adjust the visualization, such as modifying the color scale, color map or perform “band math” where multiple variables are combined to produce a new variable. Other drawbacks impact the data provider, such as storage costs and maintaining a pipeline to constantly update or reprocess the tile storage with new and updated data. Next generation approaches give that power to the user, while still giving providers control over the costs.

More recent years have seen the success of the dynamic tiling approach which allows for on-demand map tile creation. This approach has traditionally relied on Cloud-Optimized GeoTIFFs (COGs). When Zarr gained popularity for large-scale n-dimensional data analysis, users started to call for browser-based visualization, but no tools existed to visualize Zarr in the browser.

Now there are 2 options: a dynamic tile server and a dynamic client approach. rio_tiler’s XarrayReader supports tile rendering from anything that is xarray-readable. This means a tile server can render tiles from Zarr stores as well as netCDF4/HDF5 and other formats. However, a tile server still requires running a server while the second option, a “dynamic client”, reads Zarr directly in the browser client and uses WebGL to render map tiles.

The authors have contributed to libraries and testing of both approaches and authored a “Zarr Visualization Report”. This report includes the tradeoffs, requirements for preprocessing the data and performance testing results for when those preprocessing steps were taken or not. We hope that readers will be able to reuse lessons learned and recommendations to deliver their Zarr data to users in web browsers and contribute to the wider adoption of this format for large scale environmental data understanding.

Looking ahead, the focus is on making NASA datasets more accessible through these innovative approaches. The use of Kerchunk reference files, or virtual Zarr datasets, will play a key role in indexing various archival file formats used by NASA, such as HDF5 and NetCDF4. With the capability of titiler-xarray to handle any xarray-readable data, a wide range of NASA datasets can be visualized without the need for duplicating data. Additionally, the creation of data pyramids will further enhance visualization speed at lower resolutions.

How to cite: Barciauskas, A., Jones, M., Martin, K., Harkins, S., and Sarago, V.: Next-Gen Zarr Web Map Visualization, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11805, https://doi.org/10.5194/egusphere-egu24-11805, 2024.

EGU24-14629 | ECS | Orals | ESSI4.4

Co-designing an interactive tool to communicate the uncertainty of urban air quality models: uncertAIR 

Cristina Carnerero, Jan Mateu Armengol, Alvaro Criado, Antonia Frangeskou, Diana Urquiza, Dragana Bojovic, and Albert Soret

According to the World Health Organization, air pollution is the main environmental threat to public health. Urban environments are particularly critical due to their high-density population centers with often poor air quality. To characterize the exposure of citizens, the use of numerical models corrected with observational data has become a fundamental tool. Despite recent efforts, bias-corrected air quality models at the street scale exhibit significant uncertainty, partly due to the limited number of traffic and air quality observations.

Model uncertainty can critically increase far from measurement points and in regions with characteristics different from those used for calibration. In such locations, modeled data should be interpreted with caution. When the street-scale air quality models are intended to inform policy makers, estimating uncertainty is highly valuable to support decision-making protocols. A simpler air quality model with an estimation of the spatial uncertainty distribution may be preferred over a very sophisticated model that does not give any notion of uncertainty.

Within this context, we aimed at co-designing and co-developing an interactive tool to report the uncertainty of urban air quality simulations, disseminating the results tailored to the users’ needs.

The methodology consists of a geostatistical post-processing of the raw simulations of NO2 concentrations of the CALIOPE-urban air quality model in the city of Barcelona. The methodology is replicable to other cities and pollutants. The uncertainty estimation is based on the error variance of the Universal Kriging technique, which can be subsequently used to produce hourly maps of the probability of exceeding a certain threshold. Additionally, relevant social-ecological-technological variables were identified to explore the interconnections among different types of data, as well as broadening the social impact of this project. For instance, locations associated with vulnerable citizens (e.g., schools and nursing homes), or other variables potentially linked with air quality (e.g., public parks and green spaces). 

A user-centric approach was adopted, involving policymakers from local administrations, urban planners from private companies, environmental social agents and scientific personnel from research institutions and universities. To get a deep understanding of how uncertainty maps can add value to users’ objectives, we conducted a series of individual interviews and a co-design workshop based on design thinking, which allowed for the co-design of the interactive platform. The prototype of the interactive platform was presented in a second workshop, where the users tested the prototype and provided input to further developing the final tool.

The final product is the uncertAIR platform, an open-source interactive tool that integrates modeled NO2 concentrations, their uncertainty and probability of exceedances of legal thresholds, together with  social-ecological-technological variables at different scales of time and spatial resolution. Data can be visualized and downloaded with a temporal resolution of annual or daily averages, and a spatial resolution of 20 m or aggregated at census areas. This integrated dataset serves as the foundational step to integrate uncertainty information on future air quality policy making in Barcelona, such as health impact assessments, official communications, campaign planning, and location optimization of new monitoring stations.

How to cite: Carnerero, C., Mateu Armengol, J., Criado, A., Frangeskou, A., Urquiza, D., Bojovic, D., and Soret, A.: Co-designing an interactive tool to communicate the uncertainty of urban air quality models: uncertAIR, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14629, https://doi.org/10.5194/egusphere-egu24-14629, 2024.

EGU24-16316 | Posters on site | ESSI4.4

MicroWave Expertise center : a work environment for microwave data exploration 

Marie-Laure Frery, Mathilde Siméon, Roger Fjortoft, Sébastien Charlois, Mélanie Prugniaux, and Matthias Raynal

The MicroWave Expertise center has first been developed to provide a  work environment to support the calibration/validations activities and address the high resolution of Surface Water Ocean Topography (SWOT) mission, launched on December 16th, 2022.  Onboard, the new instrument ‘KaRIn’, is a revolution for both oceanography and hydrology communities and gives access to small scale measurements over ocean, worldwide river heights and flows, and lake heights.

With optimized storage and computation methods, the MicroWave Expertise Center is designed to ease the exploration and studies of 16TB/day products. The tools developed for SWOT are generic and can now be applied to any altimetric mission.

Experts are provided simple and scriptable explore numerous data providers such as copernicus dias, ecmwf, hydroweb.next.

Some tutorials are already available along with visualisation tools. And the list will be growing up in close future from users requirements.

The expertise center is operational and ensure SWOT calval activities. Prospects address SWOT ocean and hydrology studies but could be enlarged to  hydrological research, multi-sensor comparison

How to cite: Frery, M.-L., Siméon, M., Fjortoft, R., Charlois, S., Prugniaux, M., and Raynal, M.: MicroWave Expertise center : a work environment for microwave data exploration, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16316, https://doi.org/10.5194/egusphere-egu24-16316, 2024.

EGU24-16475 | Orals | ESSI4.4

NASA's FIRMS: Enabling the Use of Earth System Science Data for Wildfire Management 

Otmar Olsina, Jennifer Hewson, Diane Davies, Asen Radov, Brad Quayle, Louis Giglio, and Joanne Hall

NASA’s Fire Information for Resource Management System (FIRMS) enables users to find and analyze a range of earth system science data and information relevant to the complex and evolving field of wildfire management, impacts, and mitigation. FIRMS facilitates the use of earth system science data to inform science-based decision making through a standardized, readily interpretable interface that supports operational users, researchers, and non-scientific stakeholders. This community-driven interface enables user-friendly exploration of data that are increasingly findable, accessible, interoperable, and reproducible (FAIR), and the interface is regularly refined to support the diversity, equity, and inclusion of potential end-users. FIRMS offers fire-based maps through Web Map Service (WMS) and Web Feature Service (WFS), and makes available multiple APIs to support area, country, fire footprint features for stakeholders needing to ingest data into software such as QGIS, ArcGIS, etc. FIRMS developers are also creating a Fire Data Academy to build capacity around the use of Jupyter notebooks, Google Colab, and Python to perform data ingest, manipulation, and visualization. As the impacts of wildfires expand, affecting increasing swaths of population and biodiversity through immediate infrastructure and habitat destruction, and causing longer-term air quality impacts, a transdisciplinary approach to research and response is required. FIRMS supports a transdisciplinary approach through the range of data and information available, ensuring that all users, including those in historically underrepresented communities, can access wildfire data.

How to cite: Olsina, O., Hewson, J., Davies, D., Radov, A., Quayle, B., Giglio, L., and Hall, J.: NASA's FIRMS: Enabling the Use of Earth System Science Data for Wildfire Management, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16475, https://doi.org/10.5194/egusphere-egu24-16475, 2024.

EGU24-17568 | Posters on site | ESSI4.4

Community based services providing Open Science in water management worldwide 

Frida Gyllensvärd and Berit Arheimer

Water is the basis for life and ultimately the reason why our society could develop the way it did, and thus, water security is an indirect core component in all 17 UN sustainable development goals. However, scientific water data and information are rarely accessible in an easy and understandable way for managers and policy makers. Moreover, hydrological sciences are fragmented with less tradition of sharing results, data and tools between scientists than in many other disciplines. Numerous efforts from development projects have launched prototypes and demonstrators of web-based applications to overcome these issues, but without long-term maintenance most of them disappear at project end. Here we will present experience from developing, maintaining and using three non-commercial operational services to facilitate actions in water security and promote scientific engagement with stakeholders.

 

https://hypeweb.smhi.se/ provides readily available modelled hydrological data for continent or global scale at sub-catchment resolution of on average 1000 km2 (Arheimer et al., 2020), along with open source code with documentation and data compilation/visualization/training tools. The visitor can explore data for the past, present or future, download the numerical model, or order data subscriptions. The service also provides tutorials, model documentation and training material for model setup. The website is linked to an annual open (free) training course in HYPE modelling for various societal needs.

 

https://climateinformation.org/ is co-designed with sectorial users in low- and middle-income countries, on behalf of the World Meteorological Organisation (WMO) and the Green Climate Fund (GCF). It offers guidance for non-climate experts and access to two different tools to explore climate-change impact on water resources: 1) instant summary reports of climate change for any site on the globe, 2) easy access to many pre-calculated climate indicators. The main purpose of this new service is to provide scientific data to argue for climate mitigation and adaptation investments in vulnerable countries (Photiadou et al., 2021). Pre-calculated water variables are based on an extensive production chain using global model ensembles from global modelling communities, e.g. CMIP, Cordex, WWH and a rigorous quality assurance protocol.

 

https://dwg.smhi.se/dwg/ is co-designed with the community of the International Association of Hydrological Sciences (IAHS). It is a brand-new platform to search and find (based on key-words) where on Earth there are: scientific results available from research projects (case-studies), monitoring programs (data repositories), publications (in HSJ, PIAHS) and researchers (personal profiles). The aim is to stimulate and facilitate engagement, interactions and dialogues among scientists and between scientists and stakeholders. The Digital Water Globe offers co-creation and re-examines the role of scientific outreach; it is a scientific community effort completely dependent on content from the users to explore networking and science communication in action.

 

The presentation will focus on obtained feedback, opportunities and challenges in running operational services with aim to share scientific data and tools with a wide range of users.

 

Reference:

Arheimer et al., 2020: Global catchment modelling using World-Wide HYPE (WWH), open data and stepwise parameter estimation, HESS 24, 535–559, https://doi.org/10.5194/hess-24-535-2020   

Photiadou et al. 2021. Designing a climate service for planning climate actions in vulnerable countries. Atmosphere 12:121. https://doi.org/10.3390/atmos12010121 

How to cite: Gyllensvärd, F. and Arheimer, B.: Community based services providing Open Science in water management worldwide, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17568, https://doi.org/10.5194/egusphere-egu24-17568, 2024.

EGU24-18613 | Orals | ESSI4.4

How to benefit from multi-sensor synergy using open Ocean Virtual Laboratory tools 

Lucile Gaultier, Fabrice Collard, Craig Donlon, Ziad El Khoury Hanna, Sylvain Herlédan, and Guillaume Le Seach

In the past decade, the emergence of new satellites and sensors has facilitated the observation of a diverse range of oceanic physical variables across various scales. For instance, the Sentinel 1-2-3-6 program encompasses sensors like SAR, Ocean Color, Temperature brightness, or altimeter, each with an individual long revisit time but a rapid revisit from a constellation perspective. Additionally, geostationary sensors such as SEVIRI contribute by providing Infra Red SST every hour, significantly enhancing coverage in cloudy areas. These variables contain crucial information about the ocean's state.

Despite the wealth of data, discovering, collocating, and analyzing a heterogeneous dataset can be challenging and act as a barrier for potential users wishing to leverage Earth Observation (EO) data. Accessing low-level data and preparing them for analysis requires a diverse set of skills. Addressing this challenge, the Ocean Virtual Laboratory Next Generation (OVL-NG) project has developed two tools, which will be introduced.

Firstly, online data visualization websites, such as https://ovl.oceandatalab.com, have been made publicly accessible. These platforms empower users to explore various satellite, in-situ, and model data with just a few clicks. Users can navigate through time and space, easily compare hundreds of products (some in Near Real-Time), and utilize drawing and annotation features. The OVL web portal also facilitates sharing interesting cases with fellow scientists and communicating about captivating oceanic structures.

Secondly, a complementary tool named SEAScope offers additional features for analyzing pre-processed data and user-generated data. SEAScope is a free and open-source standalone application compatible with Windows, Linux, and macOS. It allows users to collocate data in time and space, rendering them on a 3D globe. Users can adjust rendering settings on the fly, extract data over a specific area or transect, and interface with external applications like Jupyter notebooks. This functionality enables users to extract data on a shared grid, analyze them, and import the results back into SEAScope for visualization alongside the input data.

                                         The OVL-NG tools will be showcased at the OceanDataLab booth