Presentation type:
ESSI – Earth & Space Science Informatics

ESSI1 – Next-Generation Analytics for Scientific Discovery: Data Science, Machine Learning, AI

EGU23-2843 | ECS | PICO | ESSI1.1

Geography-Aware Masked Autoencoders for Change Detection in Remote Sensing 

Lukas Kondmann, Caglar Senaras, Yuki M. Asano, Akhil Singh Rana, Annett Wania, and Xiao Xiang Zhu

Increasing coverage of commercial and public satellites allows us to monitor the pulse of the Earth in ever-shorter frequency (Zhu et al., 2017). Together with the rise of deep learning in artificial intelligence (AI) (LeCun et al., 2015), the field of AI for Earth Observation (AI4EO) is growing rapidly. However, many supervised deep learning techniques are data-hungry, which means that annotated data in large quantities are necessary to help these algorithms reach their full potential. In many Earth Observation applications such as change detection, this is often infeasible because high-quality annotations require manual labeling which is time-consuming and costly.  

Self-supervised learning (SSL) can help tackle the issue of limited label availability in AI4EO. In SSL, an algorithm is pretrained with tasks that only require the input data without annotation. Notably, Masked Autoencoders (MAE) have shown promising performances recently where a Vision Transformer learns to reconstruct a full image with only 25% of it as input. We hypothesize that the success of MAEs also extends to satellite imagery and evaluate this with a change detection downstream task. In addition, we provide a multitemporal DINO baseline which is another widely successful SSL method. Further, we test a second version of MAEs, which we call GeoMAE. GeoMAE incorporates the location and date of the satellite image as auxiliary information in self-supervised pretraining. The coordinates and date information are passed as additional tokens to the MAE model similar to the positional encoding. 
The pretraining dataset used is the RapidAI4EO corpus which contains multi-temporal Planet Fusion imagery for a variety of locations across Europe. The dataset for the downstream task also uses Planet Fusion in pairs as input data. These are provided on a 600m * 600m patch level three months apart together with a classification if the respective patch has changed in this period. Self-supervised pretraining is done for up to 150 epochs where we take the model with the best validation performance on the downstream task as a starting point for the test set. 

We find that the regular MAE model scores the best on the test set with an accuracy of 81.54% followed by DINO with 80.63% and GeoMAE with 80.02%. Pretraining MAE with ImageNet data instead of satellite images results in a notable performance loss down to 71.36%. Overall, our current pretraining experiments can not yet confirm our hypothesis that GeoMAE is advantageous compared to regular MAE. However, in similar spirit, Cong et al. (2022) recently introduced SatMAE which outlines that for other remote sensing applications, the combination of auxiliary information and novel masking strategies is a key factor. Therefore, it seems that a combination of location and time inputs together with adapted masking may also hold the most potential for change detection. There is ample potential for future research in geo-specific applications of MAEs and we provide a starting point for this with our experimental results for change detection. 

How to cite: Kondmann, L., Senaras, C., Asano, Y. M., Rana, A. S., Wania, A., and Zhu, X. X.: Geography-Aware Masked Autoencoders for Change Detection in Remote Sensing, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2843, https://doi.org/10.5194/egusphere-egu23-2843, 2023.

EGU23-3267 | ECS | PICO | ESSI1.1

Decomposition learning based on spatial heterogeneity: A case study of COVID-19 infection forecasting in Germany 

Ximeng Cheng, Jost Arndt, Emilia Marquez, and Jackie Ma

New models are emerging from Artificial Intelligence (AI) and its sub-fields, in particular, Machine Learning and Deep Learning that are being applied in different application areas including geography (e.g., land cover identification and traffic volume forecasting based on spatial data). Different from well-known datasets often used to develop AI models (e.g., ImageNet for image classification), spatial data has an intrinsic feature, i.e., spatial heterogeneity, which leads to varying relationships across different regions between the independent (i.e., the model input X) and dependent variables (i.e., the model output Y). This makes it difficult to conduct large-scale studies with a single robust AI model. In this study, we draw on the idea of modular learning, i.e., to decompose large-scale tasks into sub-tasks for specific sub-regions and use multiple AI models to achieve these sub-tasks. The decomposition is based on the spatial characteristics to ensure that the relationship between independent and dependent variables is similar in each sub-region. We explore this approach for forecasting COVID-19 cases in Germany using spatiotemporal data (e.g., weather data and human mobility data) as an example and compare the prediction tasks with a single model to the proposed decomposition learning procedure in terms of accuracy and efficiency. This study is part of the project DAKI-FWS which is funded by the Federal Ministry of Economic Affairs and Climate Action in Germany to develop an early warning system to stabilize the German economy.

How to cite: Cheng, X., Arndt, J., Marquez, E., and Ma, J.: Decomposition learning based on spatial heterogeneity: A case study of COVID-19 infection forecasting in Germany, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3267, https://doi.org/10.5194/egusphere-egu23-3267, 2023.

EGU23-4929 | PICO | ESSI1.1

Using AI and ML to support marine science research 

Ilaria Fava, Peter Thijsse, Gergely Sipos, and Dick Schaap

The iMagine project is devoted to developing and delivering imaging data and services for aquatic science. Started in September 2022, the project will provide a portfolio of image data collections, high-performance image analysis tools empowered with Artificial Intelligence, and best practice documents for scientific image analysis. These services and documentation will enable better and more efficient processing and analysis of imaging data in marine and freshwater research, accelerating our scientific insights about processes and measures relevant to healthy oceans, seas, and coastal and inland waters. By building on the European Open Science Cloud compute platform, iMagine delivers a generic framework for AI model development, training, and deployment, which researchers can adopt for refining their AI-based applications for water pollution mitigation, biodiversity and ecosystem studies, climate change analysis and beach monitoring, but also for developing and optimising other AI-based applications in this field. The iMagine AI development and testing framework offers neural networks, parallel post-processing of extensive data, and analysis of massive online data streams in distributed environments. The synergies among the eight aquatic use cases in the project will lead to common solutions in data management, quality control, performance, integration, provenance, and FAIRness and contribute to harmonisation across RIs. The resulting iMagine AI development and testing platform and the iMagine use case applications will provide another component to the European marine data management landscape, valid for the Digital Twin of the Ocean, EMODnet, Copernicus, and international initiatives. 

How to cite: Fava, I., Thijsse, P., Sipos, G., and Schaap, D.: Using AI and ML to support marine science research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4929, https://doi.org/10.5194/egusphere-egu23-4929, 2023.

EGU23-6818 | ECS | PICO | ESSI1.1

Eddy identification from along-track altimeter data with multi-modal deep learning 

Adili Abulaitijiang, Eike Bolmer, Ribana Roscher, Jürgen Kusche, and Luciana Fenoglio-Marc

Eddies are circular rotating water masses, which are usually generated near the large ocean currents, e.g., Gulf Stream. Monitoring eddies and gaining knowledge on eddy statistics over a large region are important for fishery, marine biology studies, and testing ocean models.

At mesoscale, eddies are observed in radar altimetry, and methods have been developed to identify, track and classify them in gridded maps of sea surface height derived from multi-mission data sets. However, this procedure has drawbacks since much information is lost in the gridded maps. Inevitably, the spatial and temporal resolution of the original altimetry data degrades during the gridding process. On the other hand, the task of identifying eddies has been a post-analysis process on the gridded dataset, which is, by far, not meaningful for near-real time applications or forecasts. In the EDDY project at the University of Bonn, we aim to develop methods for identifying eddies directly from along track altimetry data via a machine (deep) learning approach.

Since eddy signatures (eddy boundary and highs and lows on sea level anomaly, SLA) are not possible to extract directly from along track altimetry data, the gridded altimetry maps from AVISO are used to detect eddies. These will serve as the reference data for Machine Learning. The eddy detection on 2D grid maps is produced by open-source geometry-based approach (e.g., py-eddy-tracker, Mason et al., 2014) with additional constraints like Okubo-Weiss parameter. Later, Sea Surface Temperature (SST) maps of the same region and date (also available from AVISO) are used for manually cleaning the reference data. Noting that altimetry grid maps and SST maps have different temporal and spatial resolution, we also use the high resolution (~6 km) ocean modeling simulation dataset (e.g., FESOM, Finite Element Sea ice Ocean Model). In this case, the FESOM dataset provides a coherent, high-resolution SLA and SST, salinity maps for the study area and is a potential test basis to develop the deep learning network.

The single modal training via a Conventional Neural Network (CNN) for the 2D altimetry grid maps produced excellent dice score of 86%, meaning the network almost detects all eddies in the Gulf Stream, which are consistent with reference data. For the multi-modal training, two different training networks are developed for 1D along-track altimetry data and 2D grid maps from SLA and SST, respectively, and then they are combined to give the final classification output. A transformer model is deemed to be efficient for encoding the spatiotemporal information from 1D along track altimetry data, while CNN is sufficient for 2D grid maps from multi-sensors.

In this presentation, we show the eddy classification results from the multi-modal deep learning approach based on along track and gridded multi-source datasets for the Gulf stream area for the period between 2017 and 2019. Results show that multi-modal deep learning improve the classification by more than 20% compared to transformer model training on along-track data alone.

How to cite: Abulaitijiang, A., Bolmer, E., Roscher, R., Kusche, J., and Fenoglio-Marc, L.: Eddy identification from along-track altimeter data with multi-modal deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6818, https://doi.org/10.5194/egusphere-egu23-6818, 2023.

EGU23-8479 | ECS | PICO | ESSI1.1

Model evaluation strategy impacts the interpretation and performance of machine learning models 

Lily-belle Sweet, Christoph Müller, Mohit Anand, and Jakob Zscheischler

Machine learning models are able to capture highly complex, nonlinear relationships, and have been used in recent years to accurately predict crop yields at regional and national scales. This success suggests that the use of ‘interpretable’ or ‘explainable’ machine learning (XAI) methods may facilitate improved scientific understanding of the compounding interactions between climate, crop physiology and yields. However, studies have identified implausible, contradicting or ambiguous results from the use of these methods. At the same time, researchers in fields such as ecology and remote sensing have called attention to issues with robust model evaluation on spatiotemporal datasets. This suggests that XAI methods may produce misleading results when applied to spatiotemporal datasets, but the impact of model evaluation strategy on the results of such methods has not yet been examined.

In this study, machine learning models are trained to predict simulated crop yield, and the impact of model evaluation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the explanations provided by common XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) the interpretations of the model using common XAI methods such as permutation feature importance and (ii) the resulting model skill on unseen years and regions. We find that use of a novel cross-validation strategy based on clustering in feature-space results in the most plausible interpretations. Additionally, we find that the use of this strategy during hyperparameter tuning and feature selection results in improved model performance on unseen years and regions. Our results provide a first step towards the establishment of best practices for model evaluation strategy in similar future studies.

How to cite: Sweet, L., Müller, C., Anand, M., and Zscheischler, J.: Model evaluation strategy impacts the interpretation and performance of machine learning models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8479, https://doi.org/10.5194/egusphere-egu23-8479, 2023.

EGU23-9437 | PICO | ESSI1.1

On Unsupervised Learning from Environmental Data 

Mikhail Kanevski

Predictive learning from data usually is formulated as a problem of finding the best connection between input and output spaces by optimizing well-defined cost or risk functions.

In geo-environmental studies input space is usually constructed from the geographical coordinates and features generated from different sources of available information (feature engineering), by applying expert knowledge, using deep learning technologies and taking into account the objectives of the study. Often, it is not known in advance if the input space is complete or contains redundant features. Therefore, unsupervised learning (UL) is essential in environmental data analysis, modelling, prediction and visualization. UL also helps better understand the data and phenomena they describe as well as in interpreting/communicating modelling strategies and the results in the decision-making process.

The main objective of the present investigation is to review some important topics in unsupervised learning from environmental data: 1) quantitative description of the input space (“monitoring network”) structure using global and local topological and fractal measures, 2) dimensionality reduction, 3) unsupervised feature selection and clustering by applying a variety of machine learning algorithms (kernel-based, ensemble learning, self-organizing maps) and visualization tools.

Major attention is paid to the simulated and real spatial data (pollution, permafrost, geomorphological and wind fields data).  Considered case studies have different input space dimensionality/topology and number of measurements. It is confirmed that UL should be considered an integral part of a generic methodology of environmental data analysis. Comprehensive comparisons and discussions of the results conclude the research.

 

 

How to cite: Kanevski, M.: On Unsupervised Learning from Environmental Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9437, https://doi.org/10.5194/egusphere-egu23-9437, 2023.

EGU23-11601 | PICO | ESSI1.1

Clustering Geodata Cubes (CGC) and Its Application to Phenological Datasets 

Francesco Nattino, Ou Ku, Meiert W. Grootes, Emma Izquierdo-Verdiguier, Serkan Girgin, and Raúl Zurita-Milla

Unsupervised classification techniques are becoming essential to extract information from the wealth of data that Earth observation satellites and other sensors currently provide. These datasets are inherently complex to analyze due to the extent across multiple dimensions - spatial, temporal, and often spectral or band dimension – their size, and the high resolution of current sensors. Traditional one-dimensional cluster analysis approaches, which are designed to find groups of similar elements in datasets such as rasters or time series, may come short of identifying patterns in these higher-dimensional datasets, often referred to as data cubes. In this context, we present our Clustering Geodata Cubes (CGC) software, an open-source Python package that implements a set of co- and tri-clustering algorithms to simultaneously group elements across two and three dimensions, respectively. The package includes different implementations to most efficiently tackle datasets that fit into the memory of a single machine as well as very large datasets that require cluster computing. A refining strategy to facilitate data pattern identification is also provided. We apply CGC to investigate gridded datasets representing the predicted day of the year when spring onset events (first leaf, first bloom) occur according to a well-established phenological model. Specifically, we consider spring indices computed at high spatial resolution (1 km) and continental scale (conterminous United States) for the last 40+ years and extract the main spatiotemporal patterns present in the data via CGC co-clustering functionality.  

How to cite: Nattino, F., Ku, O., Grootes, M. W., Izquierdo-Verdiguier, E., Girgin, S., and Zurita-Milla, R.: Clustering Geodata Cubes (CGC) and Its Application to Phenological Datasets, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11601, https://doi.org/10.5194/egusphere-egu23-11601, 2023.

EGU23-12773 | PICO | ESSI1.1

Industrial Atmospheric Pollution Estimation Using Gaussian Process Regression 

Anton Sokolov, Hervé Delbarre, Daniil Boldyriev, Tetiana Bulana, Bohdan Molodets, and Dmytro Grabovets

Industrial pollution remains a major challenge in spite of recent technological developments and purification procedures. To effectively monitor atmosphere contamination, data from air quality networks should be coupled with advanced spatiotemporal statistical methods.

Our previous studies showed that standard interpolation techniques (like inverse distance weighting, linear or spline interpolation, kernel-based Gaussian Process Regression, GPR) are quite limited for the simulation of a smoke-like narrow-directed industrial pollution in the vicinity of the source (a few tenths of kilometers). In this work, we try to apply GPR, based on statistically estimated covariances. These covariances are calculated using СALPUFF atmospheric pollution dispersion model for a one-year simulation in the Kryvyi Rih region. The application of GPR permits taking into account high correlations between pollution values in neighboring points revealed by modeling. The result of the GPR covariance-based technique is compared with other interpolation techniques. It can be used then in the estimation and optimization of air quality networks.

How to cite: Sokolov, A., Delbarre, H., Boldyriev, D., Bulana, T., Molodets, B., and Grabovets, D.: Industrial Atmospheric Pollution Estimation Using Gaussian Process Regression, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12773, https://doi.org/10.5194/egusphere-egu23-12773, 2023.

EGU23-12933 | ECS | PICO | ESSI1.1

Estimating vegetation carbon stock components by linking ground databases with Earth observations 

Daniel Kinalczyk, Christine Wessollek, and Matthias Forkel

Land ecosystems dampen the increase of atmospheric CO2 by storing carbon in soils and vegetation. In order to estimate how long carbon stays in land ecosystems, a detailed knowledge about the distribution of carbon in different vegetation components is needed. Current Earth observation products provide estimates about total above-ground biomass but do not further separate between carbon stored in trees, understory vegetation, shrubs, grass, litter or woody debris. Here we present an approach in which we link several Earth observation products with a ground-based database to estimate biomass in various vegetation components. Therefore, we use information about the statistical distribution of biomass components provided by the North American Wildland Fuels Database (NAWFD), which are however not available as geocoded data. We use ESA CCI AGB version 3 data from 2010 as a proxy in order to link the NAWFD data to the spatial information from Earth observation products. The biomass and corresponding uncertainty from the ESA CCI AGB and a map of vegetation types are used to select the likely distribution of vegetation biomass components from the set of in-situ measurements of tree biomass. We then apply Isolation Forest outlier detection and bootstrapping for a robust comparison of both datasets and for uncertainty estimation. We use Random Forest and Gaussian Process regression to predict the biomass of trees, shrubs, snags, herbaceous vegetation, coarse and fine woody debris, duff and litter from ESA CCI AGB and land cover, GEDI canopy height, Sentinel-3 LAI and bioclimatic data. The regression models reach high predictive power and allow to also extrapolate to other regions. Our derived estimates of vegetation carbon stock components provide a more detailed view on the land carbon storage and contribute to an improved estimate of potential carbon emissions from respiration, disturbances and fires.

How to cite: Kinalczyk, D., Wessollek, C., and Forkel, M.: Estimating vegetation carbon stock components by linking ground databases with Earth observations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12933, https://doi.org/10.5194/egusphere-egu23-12933, 2023.

EGU23-13196 | ECS | PICO | ESSI1.1

From Super-Resolution to Downscaling - An Image-Inpainting Deep Neural Network for High Resolution Weather and Climate Models 

Maximilian Witte, Danai Filippou, Étienne Plésiat, Johannes Meuer, Hannes Thiemann, David Hall, Thomas Ludwig, and Christopher Kadow

High resolution in weather and climate was always a common and ongoing goal of the community. In this regards, machine learning techniques accompanied numerical and statistical methods in recent years. Here we demonstrate that artificial intelligence can skilfully downscale low resolution climate model data when combined with numerical climate model data. We show that recently developed image inpainting technique perform accurate super-resolution via transfer learning using the HighResMIP of CMIP6 (Coupled Model Intercomparison Project Phase 6) experiments. Its huge data base offers a unique training opportunity for machine learning approaches. The transfer learning purpose allows also to downscale other CMIP6 experiments and models, as well as observational data like HadCRUT5. Combined with the technology of Kadow et al. 2020 of infilling missing climate data, we gain a neural network which reconstructs and downscales the important observational data set (IPCC AR6) at the same time. We further investigate the application of our method to downscale quantities predicted from a numerical ocean model (ICON-O) to improve computation times. In this process we focus on the ability of the model to predict eddies from low-resolution data.

An extension to:

Kadow, C., Hall, D.M. & Ulbrich, U. Artificial intelligence reconstructs missing climate information. Nature Geoscience 13, 408–413 (2020). https://doi.org/10.1038/s41561-020-0582-5

How to cite: Witte, M., Filippou, D., Plésiat, É., Meuer, J., Thiemann, H., Hall, D., Ludwig, T., and Kadow, C.: From Super-Resolution to Downscaling - An Image-Inpainting Deep Neural Network for High Resolution Weather and Climate Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13196, https://doi.org/10.5194/egusphere-egu23-13196, 2023.

EGU23-14716 | ECS | PICO | ESSI1.1

Spatial-temporal transferability assessment of remote sensing data models for mapping agricultural land use 

Jayan Wijesingha, Ilze Dzene, and Michael Wachendorf

To assess the impact of anthropogenic and natural causes on land use and land use cover change, mapping of spatial and temporal changes is increasingly applied. Due to the availability of satellite image archives, remote sensing (RS) data-based machine learning models are in particular suitable for mapping and analysing land use and land cover changes. Most often, models trained with current RS data are employed to estimate past land cover and land use using available RS data with the assumption that the trained model predicts past data values similar to the accuracy of present data. However, machine learning models trained on RS data from particular locations and times may not be well transferred to new locations and time datasets due to various reasons. This study aims to assess the spatial-temporal transferability of the RS data models in the context of agricultural land use mapping. The study was designed to map agricultural land use (5 classes: maize, grasslands, summer crops, winter crops, and mixed crops) in two regions in Germany (North Hesse and Weser Ems) between the years 2010 and 2018 using Landsat archive data (i.e., Landsat 5, 7, and 8). Three model transferability scenarios were evaluated, a) temporal - S1, b) spatial - S2 and c) spatial-temporal - S3. Two machine learning models (random forest - RF and Convolution Neural Network - CNN) were trained. For each transferability scenario, class-level F1 and macro F1 values were compared between the reference and targeted transferability systems. Moreover, to explain the results of transferability scenarios, transferability results were further explored using dissimilarity index and area of applicability (AOA) concepts. The average macro F1 value of the trained model for the reference scenario (no transferability) was 0.75. For assessed transferability scenarios, the average macro F1 values were 0.70, 0.65 and 0.60, for S1, S2, and S3 respectively. It shows that, when predicting data from different spatial-temporal contexts, the model performance is decreasing. In contrast, the average proportion of the data inside the AOA did not show a clear pattern for different scenarios. In the context of RS data-related model building, spatial-temporal transferability is essential because of the limited availability of the labelled data. Thus, the results from this case study provide an understanding of how model performance changes when the model is transferred to new settings with data from different temporal and spatial domains.

How to cite: Wijesingha, J., Dzene, I., and Wachendorf, M.: Spatial-temporal transferability assessment of remote sensing data models for mapping agricultural land use, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14716, https://doi.org/10.5194/egusphere-egu23-14716, 2023.

EGU23-16096 | ECS | PICO | ESSI1.1

Limitations of machine learning in a spatial context 

Jens Heinke, Christoph Müller, and Dieter Gerten

Machine learning algorithms have become popular tools for the analysis of spatial data. However, a number of studies have demonstrated that the application of machine learning algorithms in a spatial context has limitations. New geographic locations may lie outside of the data range for which the model was trained, and estimates of model performance may be too optimistic, when spatial autocorrelation of geographic data is not properly accounted for in cross-validation. We here use artificially created spatial data fields to conduct a series of experiments to further investigate the potential pitfalls of random forest regression applied to spatial data. We provide new insights on previously reported limitations and identify further limitations. We demonstrate that the same mechanism that leads to overoptimistic estimates of model performance (when based on ordinary random k-fold cross-validation) can also lead to a deterioration of model performance. When covariates contain sufficient information to deduce spatial coordinates, the model can reproduce any spatial pattern in the training data even if it is entirely or partly unrelated to the covariates. The presence of spatially correlated residuals in the training data changes how the model utilizes the information of the covariates and impedes the identification of the actual relationship between covariates and response. This reduces model performance when the model is applied to data with a different spatial structure. Under such conditions, machine learning methods that are sufficiently flexible to fit to autocorrelated residuals (such as random forest) may not be an optimal choice. Better models may be obtained using less flexible but more transparent approaches such as generalized linear models or additive models.

How to cite: Heinke, J., Müller, C., and Gerten, D.: Limitations of machine learning in a spatial context, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16096, https://doi.org/10.5194/egusphere-egu23-16096, 2023.

EGU23-16768 | PICO | ESSI1.1

Knowledge Representation of Levee Systems - an Environmental Justice Perspective 

Armita Davarpanah, Anthony.l Nguy Robertson, Monica Lipscomb, Jacob.w. McCord, and Amy Morris

Levee systems are designed to reduce the risk of water-related natural hazards (e.g., flooding) in areas behind levees. Most levees in the U.S. are designed to protect people and facilities against the impacts of the 100-year floods. However, the current climate change is increasing the probability of the occurrence of 500-year flood events that in turn increases the likelihood of economic loss, environmental damage, and fatality that disproportionately impacts communities of color and low-income groups facing socio-economic inequities in leveed areas. The increased frequency and intensity of flooding is putting extra pressure on emergency responders that often require diverse, multi-dimensional data originating from different sources to make sound decisions. Currently, the integration of these heterogeneous data acquired by diverse sensors and emergency agencies about environmental, hydrological, and demographic indicators requires costly and complex programming and analysis that hinder rapid disaster management efforts. Our domain ‘Levee System Ontology (LSO)’ resolves the data integration and software interoperability issues by semantically modeling the static aspects, dynamic processes, and information content of the levee systems by extending the well-structured, top-level Basic Formal Ontology (BFO) and mid-level Common Core Ontologies (CCO). LSO’s class and property names follow the terminology of the National Levee Database (NLD), allowing data scientists using NLD data to constrain their classifications based on the knowledge represented in LSO. In addition to modeling the information related to the characteristics and status of the structural components of the levee system, LSO represents the residual risk in leveed areas, economic and environmental losses, and damage to facilities in case of breaching and/or overtopping of levees. LSO enables reasoning to infer components and places along levees and floodwalls where the system requires inspection, maintenance, and repair based on the status of system components. The ontology also represents the impact of flood management activities on different groups of people from an environmental justice perspective, based on the principles of DEI (diversity, equity, inclusion) as defined by the U.N. Sustainable Development Goals.

How to cite: Davarpanah, A., Nguy Robertson, A. L., Lipscomb, M., McCord, J. w., and Morris, A.: Knowledge Representation of Levee Systems - an Environmental Justice Perspective, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16768, https://doi.org/10.5194/egusphere-egu23-16768, 2023.

Large-scale climate variability is analysed, modelled, and predicted mainly based on general circulation models and low-dimensional association analysis. The models’ equational basis makes it difficult to produce mathematical analysis results and clear interpretations, whereas the association analysis cannot establish causation sufficiently to make invariant predictions. However, the macroscale causal structures of the climate system may accomplish the tasks of analysis, modelling, and prediction according to the concepts of causal emergence and causal prediction’s invariance.

Under the assumptions of no unobserved confounders and linear Gaussian models, we examine whether the macroscale causal structures of the climate system can be inferred not only to model but also to predict the large-scale climate variability. Specifically, first, we obtain the causal structures of the macroscale air-sea interactions of El Niño–Southern Oscillation (ENSO), which are interpretable in terms of physics. The structural causal models constructed accordingly can model the ENSO diversity realistically and predict the ENSO variability. Second, this study identifies the joint effect of ENSO and three other winter climate phenomena on the interannual variability in the East Asian summer monsoon. Using regression, these causal precursors can predict the monsoon one season ahead, outperforming association-based empirical models and several climate models. Third, we introduce a framework that infers ENSO’s air-sea interactions from high-dimensional data sets. The framework is based on aggregating the causal discovery results of bootstrap samples to improve high-dimensional variable selection. It is also based on spatial-dimension reduction to allow of clear interpretations at the macroscale.

While further integration with nonlinear non-Gaussian models will be necessary to establish the full benefits of inferring causal structures as a standard practice in research and operational predictions, our study may offer a route to providing concise explanations of the climate system and reaching accurate invariant predictions.

How to cite: He, S., Yang, S., and Chen, D.: Inferring Causal Structures to Model and Predict ENSO and Its Effect on Asian Summer Monsoon, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-102, https://doi.org/10.5194/egusphere-egu23-102, 2023.

EGU23-239 | ECS | Orals | ITS1.14/CL5.8

Toward a hybrid tropical cyclone global model 

Roberto Ingrosso and Mathieu Boudreault

The future evolution of tropical cyclones (TCs) in a warming world is an important issue, considering their potential socio-economic impacts on the areas hit by these phenomena. Previous studies provide robust responses about the future increase in intensity and in the global proportion of major TCs (Category 4–5). On the other hand, high uncertainty is associated to a projected future decrease in global TCs frequency and to potential changes in TC tracks and translation speed.

Risk management and regulatory actions require more robust quantification in how the climate change affects TCs dynamics.  A probabilistic hybrid TC model based upon statistical and climate models, physically coherent with TCs dynamics, is being built to investigate the potential impacts of climate change. Here, we provide preliminary results, in terms of present climate reconstruction (1980-2021) and future projections (2022-2060) of cyclogenesis locations and TC tracks, based on different statistical models, such as logistic and multiple linear regressions and random forest.  Physical predictors associated with the TC formation and motion and produced by reanalysis (ERA5) and the Community Earth System Model (CESM) ensemble are considered in this study.

 

How to cite: Ingrosso, R. and Boudreault, M.: Toward a hybrid tropical cyclone global model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-239, https://doi.org/10.5194/egusphere-egu23-239, 2023.

EGU23-492 | ECS | Posters on site | ITS1.14/CL5.8

Separation of climate models and observations based on daily output using two machine learning classifiers 

Lukas Brunner, Sebastian Sippel, and Aiko Voigt

Climate models are primary tools to investigate processes in the climate system, to project future changes, and to inform decision makers. The latest generation of models provides increasingly complex and realistic representations of the real climate system while there is also growing awareness that not all models produce equally plausible or independent simulations. Therefore, many recent studies have investigated how models differ from observed climate and how model dependence affects model output similarity, typically drawing on climatological averages over several decades.

Here, we show that temperature maps from individual days from climate models from the CMIP6 archive can be robustly identified as “observation” or “model” even after removing the global mean. An important exception is a prototype high-resolution simulation from the ICON model family that can not be so  unambiguously classified into one category. These results highlight that persistent differences between observed and simulated climate emerge at very short time scales already, but very high resolution modelling efforts may be able to overcome some of these shortcomings.

We use two different machine learning classifiers: (1) logistic regression, which allows easy insights into the learned coefficients but has the limitation of being a linear method and (2) a convolutional neural network (CNN) which represents rather the other end of the complexity spectrum, allowing to learn nonlinear spatial relations between features but lacking the easy interpretability logistic regression allows. For CMIP6 both methods perform comparably, while the CNN is also able to recognize about 75% of samples from ICON as coming from a model, while linear regression does not have any skill for this case.

Overall, we demonstrate that the use of machine learning classifiers, once trained, can overcome the need for multiple decades of data to investigate a given model. This opens up novel avenues to test model performance on much shorter times scales.

How to cite: Brunner, L., Sippel, S., and Voigt, A.: Separation of climate models and observations based on daily output using two machine learning classifiers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-492, https://doi.org/10.5194/egusphere-egu23-492, 2023.

EGU23-753 | ECS | Orals | ITS1.14/CL5.8 | Highlight

Finding regions of similar sea level variability with the help of a Gaussian Mixture Model 

Lea Poropat, Céline Heuzé, and Heather Reese

In climate research we often want to focus on a specific region and the most prominent processes affecting it, but how exactly do we select the borders of that region? We also often need to use long-term in situ observations to represent a larger area, but which area exactly are they representative for? In ocean sciences we usually consider basins as separate regions or even simpler, just select a rectangle of the ocean, but that does not always correspond to the real, physically relevant borders. As alternative, we use an unsupervised classification model, Gaussian Mixture Model (GMM), to separate the northwestern European seas into regions based on the sea level variability observed by altimetry satellites.

After performing a principal component (PC) analysis on the 24 years of monthly sea level data, we use the stacked PC maps as input for the GMM. We used the Bayesian Information Criterion to determine into how many regions our area should be split because GMM requires the number of classes to be selected a priori. Depending on the number of PCs used, the optimal number of classes was between 12 and 18, more PCs typically allowing the separation into more regions. Due to the complexity of the data and the dependence of the results on the starting randomly chosen weights, the classification can differ to a degree with every new run of the model, even if we use the exact same data and parameters. To tackle that, instead of using one model, we use an ensemble of models and then determine which class does each grid point belong to by soft voting, i.e., each of the models provides a probability that the point belongs to a particular class and the class with the maximal sum of probabilities wins. As a result, we obtain both the classification and the likelihood of the model belonging to that class.

Despite not using the coordinates of the data points in the model at all, the obtained classes are clearly location dependent, with grid points belonging to the same class always being close to each other. While many classes are defined by bathymetry changes, e.g., the continental shelf break and slope, sometimes other factors come into play, such as for the split of the Norwegian coast into two classes or for the division in the Barents Sea, which is probably based on the circulation. The North Sea is also split into three distinct regions, possibly based on sea level changes caused by dominant wind patterns.

This method can be applied to almost any atmospheric or oceanic variable and used for larger or smaller areas. It is quick and practical, allowing us to delimit the area based on the information we cannot always clearly see from the data, which can facilitate better selection of the regions that need further research.

How to cite: Poropat, L., Heuzé, C., and Reese, H.: Finding regions of similar sea level variability with the help of a Gaussian Mixture Model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-753, https://doi.org/10.5194/egusphere-egu23-753, 2023.

EGU23-849 | ECS | Orals | ITS1.14/CL5.8

Drivers of sea level variability using neural networks 

Linn Carlstedt, Lea Poropat, and Céline Heuzé

Understanding the forcing of regional sea level variability is crucial as many people all over the world live along the coasts and are endangered by the sea level rise. The adding of fresh water into the oceans due to melting of the Earth’s land ice together with thermosteric changes has led to a rise of the global mean sea level (GMSL) with an accelerating rate during the twentieth century, and has now reached a mean rate of 3.7 mm per year according to IPCCs latest report. However, this change varies spatially and the dynamics behind what forces sea level variability on a regional to local scale is still less known, thus making it hard for decision makers to mitigate and adapt with appropriate strategies.

Here we present a novel approach using machine learning (ML) to identify the dynamics and determine the most prominent drivers forcing coastal sea level variability. We use a recurrent neural network called Long Short-Term Memory (LSTM) with the advantage of learning data in sequences and thus capable of storing some memory from previous timesteps, which is beneficial when dealing with time series. To train the model we use hourly ERA5 10-m wind, mean sea level pressure (MSLP), sea surface temperature (SST), evaporation and  precipitation data between 2009-2017 in the North Sea region. To reduce the dimensionality of the data but still preserve maximal information we conduct principal component analysis (PCA) after removing the climatology which are calculated by hourly means over the years. Depending on the explained variance of the PCs for each driver, 2-4 PCs are chosen and cross-correlated to eliminate collinearity, which could affect the model results. Before being used in the ML model the final preprocessed data are normalized by min-max scaling to optimize the learning. The target data in the model are hourly in-situ sea level observations from West-Terschelling in the Netherlands. Using in-situ observations compared to altimeter data enhances the ability of making good predictions in coastal zones as altimeter data has a tendency to degrade along the coasts. The sea level time series is preprocessed by tidal removal and de-seasoned by subtracting the hourly means. To determine which drivers are most prominent for the sea surface variability in our location, we mute one driver at a time in the training of the network and evaluate the eventual improvement or deterioration of the predictions.

Our results show that the zonal wind is the most prominent forcing of sea level variability in our location, followed by meridional wind and MSLP. While the SST greatly affects the GMSL, SST seems to have little to no effect on local sea level variability compared to other drivers. This approach shows great potential and can easily be applied to any coastal zone and is thus very useful for a broad body of decision makers all over the world. Identifying the cause of local sea level variability will also enable the ability of producing better models for future predictions, which is of great importance and interest.

How to cite: Carlstedt, L., Poropat, L., and Heuzé, C.: Drivers of sea level variability using neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-849, https://doi.org/10.5194/egusphere-egu23-849, 2023.

EGU23-984 | ECS | Orals | ITS1.14/CL5.8

Data-driven Attributing of Climate Events with Climate Index Collection based on Model Data (CICMoD) 

Marco Landt-Hayen, Willi Rath, Sebastian Wahl, Nils Niebaum, Martin Claus, and Peer Kröger

Machine learning (ML) and in particular artificial neural networks (ANNs) push state-of-the-art solutions for many hard problems e.g., image classification, speech recognition or time series forecasting. In the domain of climate science, ANNs have good prospects to identify causally linked modes of climate variability as key to understand the climate system and to improve the predictive skills of forecast systems. To attribute climate events in a data-driven way with ANNs, we need sufficient training data, which is often limited for real world measurements. The data science community provides standard data sets for many applications. As a new data set, we introduce a collection of climate indices typically used to describe Earth System dynamics. This collection is consistent and comprehensive as we use control simulations from Earth System Models (ESMs) over 1,000 years to derive climate indices. The data set is provided as an open-source framework that can be extended and customized to individual needs. It allows to develop new ML methodologies and to compare results to existing methods and models as benchmark. Exemplary, we use the data set to predict rainfall in the African Sahel region and El Niño Southern Oscillation with various ML models. We argue that this new data set allows to thoroughly explore techniques from the domain of explainable artificial intelligence to have trustworthy models, that are accepted by domain scientists. Our aim is to build a bridge between the data science community and researchers and practitioners from the domain of climate science to jointly improve our understanding of the climate system.

How to cite: Landt-Hayen, M., Rath, W., Wahl, S., Niebaum, N., Claus, M., and Kröger, P.: Data-driven Attributing of Climate Events with Climate Index Collection based on Model Data (CICMoD), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-984, https://doi.org/10.5194/egusphere-egu23-984, 2023.

EGU23-1135 | ECS | Posters on site | ITS1.14/CL5.8

Curation of High-level Molecular Atmospheric Data for Machine Learning Purposes 

Vitus Besel, Milica Todorović, Theo Kurtén, Patrick Rinke, and Hanna Vehkamäki

As cloud and aerosol interactions remain large uncertainties in current climate models (IPCC) they are of special interest for atmospheric science. It is estimated that more than 70% of all cloud condensation nuclei origin from so-called New Particle Formation, which is the process of gaseous precursors clustering together in the atmosphere and subsequent growth into particles and aerosols. After initial clustering this growth is driven strongly by condensation of low volatile organic compounds (LVOC), that is molecules with saturation vapor pressures (pSat) below 10-6 mbar [1]. These origin from organic molecules emitted by vegetation that are subsequently rapidly oxidized in the air, so-called Biogenic LVOC (BLVOC).

We have created a big data set of BLVOC using high-throughput computing and Density Functional Theory (DFT), and use it to train Machine Learning models to predict pSat of previously unseen BLVOC. Figure 1 illustrates some sample molecules form the data.

Figure 1: Sample molecules, for small, medium large sizes.     Figure 2: Histogram of the calculated saturation vapor pressures.

Initially the chemical mechanism GECKO-A provides possible BLVOC molecules in the form of SMILES strings. In a first step the COSMOconf program finds and optimizes structures of possible conformers and provides their energies for the liquid phase on a DFT level of theory. After an additional calculation of the gas phase energies with Turbomole, COSMOtherm calculates thermodynamical properties, such as the pSat, using the COSMO-RS [1] model. We compressed all these computations together in a highly parallelised high-throughput workflow to calculate 32k BLVOC, that include over 7 Mio. molecular conformers. See a histogram of the calculated pSat in Figure 2.

We use the calculated pSat to train a Gaussian Process Regression (GPR) machine learning model with the Topological Fingerprint as descriptor for molecular structures. The GPR incorporates noise and outputs uncertainties for predictions on the pSat. These uncertainties and data cluster techniques allow for the active choosing of molecules to include in the training data, so-called Active Learning. Further, we explore using SLISEMAP [2] explainable AI methods to correlate Machine Learning predictions, the high-dimensional descriptors and human-readable properties, such as functional groups.

[1] Metzger, A. et al. Evidence for the role of organics in aerosol particle formation under atmospheric conditions. Proc. Natl. Acad. Sci. 107, 6646–6651, 10.1073/pnas.0911330107 (2010)
[2] Klamt, A. & Schüürmann, G. Cosmo: a new approach to dielectric screening in solvents with explicit expressions for the
screening energy and its gradient. J. Chem. Soc., Perkin Trans. 2 799–805, 10.1039/P29930000799 (1993).
[3] Björklund, A., Mäkelä, J. & Puolamäki, K. SLISEMAP: supervised dimensionality reduction through local explanations. Mach Learn (2022). https://doi.org/10.1007/s10994-022-06261-1

How to cite: Besel, V., Todorović, M., Kurtén, T., Rinke, P., and Vehkamäki, H.: Curation of High-level Molecular Atmospheric Data for Machine Learning Purposes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1135, https://doi.org/10.5194/egusphere-egu23-1135, 2023.

EGU23-1244 | Posters on site | ITS1.14/CL5.8

Machine learning for non-orographic gravity waves in a climate model 

Steven Hardiman, Adam Scaife, Annelize van Niekerk, Rachel Prudden, Aled Owen, Samantha Adams, Tom Dunstan, Nick Dunstone, and Melissa Seabrook

There is growing use of machine learning algorithms to replicate sub-grid parametrisation schemes in global climate models.  Parametrisations rely on approximations, thus there is potential for machine learning to aid improvements.  In this study, a neural network is used to mimic the behaviour of the non-orographic gravity wave scheme used in the Met Office climate model, important for stratospheric climate and variability.  The neural network is found to require only two of the six inputs used by the parametrisation scheme, suggesting the potential for greater efficiency in this scheme.  Use of a one-dimensional mechanistic model is advocated, allowing neural network hyperparameters to be trained based on emergent features of the coupled system with minimal computational cost, and providing a test bed prior to coupling to a climate model.  A climate model simulation, using the neural network in place of the existing parametrisation scheme, is found to accurately generate a quasi-biennial oscillation of the tropical stratospheric winds, and correctly simulate the non-orographic gravity wave variability associated with the El Nino Southern Oscillation and stratospheric polar vortex variability.  These internal sources of variability are essential for providing seasonal forecast skill, and the gravity wave forcing associated with them is reproduced without explicit training for these patterns.

How to cite: Hardiman, S., Scaife, A., van Niekerk, A., Prudden, R., Owen, A., Adams, S., Dunstan, T., Dunstone, N., and Seabrook, M.: Machine learning for non-orographic gravity waves in a climate model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1244, https://doi.org/10.5194/egusphere-egu23-1244, 2023.

EGU23-1502 | ECS | Orals | ITS1.14/CL5.8

Adapting Transfer Learning for Multiple Channels in Satellite Data Applications 

Naomi Simumba and Michiaki Tatsubori

Transfer learning is a technique wherein information learned by previously trained models is applied to new learning tasks. Typically, weights learned by a network pretrained on other datasets are copied or transferred to new networks. These new networks, or downstream models, are then are then used for assorted tasks. Foundation models extend this concept by training models on large datasets. Such models gain a contextual understanding which can then be used to improve performance of downstream tasks in different domains. Common examples include GPT-3 in the field on natural language processing and ImageNet trained models in the field of computer vision.

Beyond its high rate of data collection, satellite data also has a wide range of meaningful applications including climate impact modelling and sustainable energy. This makes foundation models trained on satellite data very beneficial as they would reduce the time, data, and computational resources required to obtain useful downstream models for these applications.

However, satellite data models differ from typical computer vision models in a crucial way. Because several types of satellite data exist, each with its own benefits, a typical use case for satellite data involves combining multiple data inputs in configurations that are not readily apparent during pretraining of the foundation model. Essentially, this means that the downstream application may have a different number of input channels from the pretrained model, which raises the question of how to successfully transfer information learned by the pretrained model to the downstream application.

This research proposes and examines several architectures for the downstream model that allow for pretrained weights to be incorporated when a different number of input channels is required. For evaluation, models pretrained with self-supervised learning on precipitation data are applied to a downstream model which conducts temporal interpolation of precipitation data and requires two inputs. The effect of including perceptual loss to enhance model performance is also evaluated. These findings can be used to guide adaptation for applications ranging from flood modeling, land use detection, and more.

How to cite: Simumba, N. and Tatsubori, M.: Adapting Transfer Learning for Multiple Channels in Satellite Data Applications, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1502, https://doi.org/10.5194/egusphere-egu23-1502, 2023.

Unprecedented flash floods (FF) in urban regions are increasing due to heavy rainfall intensity and magnitude as a result of human-induced climate and land-use changes. The changes in weather patterns and various anthropogenic activities increase the complexity of modelling the FF at different spatiotemporal scales: which indicates the importance of multi-resolution forcing information. Towards this, developing new methods for processing coarser resolution spatio-temporal datasets are essential for the efficient modelling of FF. While a wide range of methods is available for spatial and temporal downscaling of the climate data, the multi-temporal downscaling strategy has not been investigated for ungauged stations of streamflow. The current study proposed a multi-temporal downscaling (MTD) methodology for gauged and ungauged stations using Adaptive Emulator Modelling concepts for daily to sub-daily streamflows. The proposed MTD framework for ungauged stations comprise a hybrid framework with conceptual and machine learning-based approaches to analyze the catchment behavior and downscale the model outputs from daily to sub-daily scales. The study area, Peachtree Creek watershed (USA), frequently experiences flash floods; hence, selected to validate the proposed framework. Further, the study addresses the critical issues of model development, seasonality, and diurnal variation of MTD data. The study obtained MTD data with minimal uncertainty on capturing the hydrological signatures and nearly 95% of accuracy in predicting the flow attributes over ungauged stations. The proposed framework can be highly useful for short- and long-range planning, management, and mitigation measurements, where the absence of fine resolution data prohibits flash flood modeling.

How to cite: Budamala, V., Wadhwa, A., and Bhowmik, R. D.: Multi-Temporal Downscaling of Streamflow for Ungauged Stations/ Sub-Basins from Daily to Sub-Daily Interval Using Hybrid Framework – A Case Study on Flash Flood Watershed, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1855, https://doi.org/10.5194/egusphere-egu23-1855, 2023.

EGU23-2289 | ECS | Posters on site | ITS1.14/CL5.8

Towards understanding the effect of parametric aerosol uncertainty on climate using a chemical transport model perturbed parameter ensemble. 

Meryem Bouchahmoud, Tommi Bergman, and Christina Williamson

Aerosols in the climate system have a direct link to the Earth’s energy balance. Aerosols interact directly with the solar radiation through scattering and absorption; and indirectly by changing cloud properties. The effect aerosols have on climate is one of the major causes of radiative forcing (RF) uncertainty in global climate model simulations. Thus, reducing aerosol RF uncertainty is key to improving climate prediction. The objective of this work is to understand the magnitude and causes of aerosol uncertainty in the chemical transport model TM5.

Perturbed Parameter Ensembles (PPEs) are a set of model runs created by perturbing an ensemble of parameters. Parameters are model inputs, for this study we focus on parameters describing aerosol emissions, properties and processes, such as dry deposition, aging rate, emissions to aerosols microphysics. PPEs vary theses parameters over their uncertainty range all at once to study their combine effect on TM5.

Varying these parameters along with others through their value range, will reflect on TM5 outputs. The TM5 outputs parameters we are using in our sensitivity study are the cloud droplet number concentration and the ambient aerosol absorption optical thickness at 550nm.

Here we discuss the design of the PPE, and one-at-a-time sensitivity studies used in this process. The PPE samples the parameter space to enable us to use emulation. Emulating is a machine learning technique that uses a statistical surrogate model to replace the chemical transport model. The aim is to provide output data with more dense sampling throughout the parameter space. We will be using a Gaussian process emulator, which has been shown to be an efficient technique for quantifying parameter sensitivity in complex global atmospheric models.

We also describe plans to extend this work to emulate an aerosol PPE for EC-Earth. The PPE for EC-Earth will also contain cloud parameters that will vary over their uncertainty range together with the aerosol parameters to examine the influence of aerosol parametric uncertainty on RF.

 

How to cite: Bouchahmoud, M., Bergman, T., and Williamson, C.: Towards understanding the effect of parametric aerosol uncertainty on climate using a chemical transport model perturbed parameter ensemble., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2289, https://doi.org/10.5194/egusphere-egu23-2289, 2023.

EGU23-2541 | ECS | Posters on site | ITS1.14/CL5.8

Machine learning based automated parameter tuning of ICON-A using satellite data 

Pauline Bonnet, Fernando Iglesias-Suarez, Pierre Gentine, Marco Giorgetta, and Veronika Eyring

Global climate models use parameterizations to represent the effect of subgrid scale processes on the resolved state. Parameterizations in the atmosphere component usually include radiation, convection, cloud microphysics, cloud cover, gravity wave drag, vertical turbulence in the boundary layer and other processes. Parameterizations are semi-empirical functions that include a number of tunable parameters. Because these parameters are loosely constraint with experimental data, a range of values are typically explored by evaluating model runs against observations and/or high resolution runs. Fine tuning a climate model is a complex inverse problem due to the number of tunable parameters and observed climate properties to fit. Moreover, parameterizations are sources of uncertainties for climate projections, thus fine tuning is a crucial step in model development.

Traditionally, tuning is a time-consuming task done manually, by iteratively updating the values of the parameters in order to investigate the parameter space with user-experience driven choices. To overcome such limitation and search efficiently through the parameter space one can implement automatic techniques. Typical steps in automatic tuning are: (i) constraining the scope of the study (model, simulation setup, parameters, metrics to fit and corresponding reference values); (ii) conducting a sensitivity analysis to reduce the parameter space and/or building an emulator for the climate model; and (iii) conducting a sophisticated grid search to define the optimum parameter set or its distribution (e.g., rejection sampling and history matching). The ICOsahedral Non-hydrostatic (ICON) model is a modelling framework for numerical weather prediction and climate projections. We implement a ML-based automatic tuning technic to tune a recent version of ICON-A with a spatial resolution typically used for climate projections. We evaluate the tuned ICON-A model against satellite observations using the Earth System Model Evaluation Tool (ESMValTool). Although automatic tuning technics allow to reach the optimum parameter values in less steps than with the manual tuning, they still require some experience-driven choices throughout the tuning process. Moreover, the performances of the tuned model is limited by the structural errors of the model, inherent to the mathematical description of the parameterizations included in the model.

How to cite: Bonnet, P., Iglesias-Suarez, F., Gentine, P., Giorgetta, M., and Eyring, V.: Machine learning based automated parameter tuning of ICON-A using satellite data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2541, https://doi.org/10.5194/egusphere-egu23-2541, 2023.

EGU23-3404 | ECS | Posters on site | ITS1.14/CL5.8 | Highlight

Deep learning-based generation of 3D cloud structures from geostationary satellite data 

Sarah Brüning, Stefan Niebler, and Holger Tost

Clouds and their interdependent feedback mechanisms remain a source of insecurity in climate science. This said, overcoming relating obstacles especially in the context of a changing climate emphasizes the need for a reliable database today more than ever. While passive remote sensing sensors provide continuous observations of the cloud top, they lack vital information on subjacent levels. Here, active instruments can deliver valuable insights to fill this gap in knowledge.

This study sets on to combine the benefits of both instrument types. It aims (1) to reconstruct the vertical distribution of volumetric radar data along the cloud column and (2) to interpolate the resultant 3D cloud structure to the satellite’s full disk by applying a contemporary Deep-Learning approach. Input data was derived by an automated spatio-temporally matching between high-resoluted satellite channels and the overflight of the radar. These samples display the physical predictors that were fed into the network to reconstruct the cloud vertical distribution on each of the radar’s height levels along the whole domain. Data from the entire year 2017 was used to integrate seasonal variations into the modeling routine.

The results demonstrate not only the network’s ability to reconstruct the cloud column along the radar track but also to interpolate coherent structures into a large-scale perspective. While the model performs equally well over land and water bodies, its applicable time frame is limited to daytime predictions only. Finally, the generated data can be leveraged to build a comprehensive database of 3D cloud structures that is to be exploited in proceeding applications.

How to cite: Brüning, S., Niebler, S., and Tost, H.: Deep learning-based generation of 3D cloud structures from geostationary satellite data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3404, https://doi.org/10.5194/egusphere-egu23-3404, 2023.

EGU23-3418 | ECS | Posters on site | ITS1.14/CL5.8

Building a physics-constrained, fast and stable machine learning-based radiation emulator 

Guillaume Bertoli, Sebastian Schemm, Firat Ozdemir, Fernando Perez Cruz, and Eniko Szekely

Modelling the transfer of radiation through the atmosphere is a key component of weather and climate models. The operational radiation scheme in the Icosahedral Nonhydrostatic Weather and Climate Model (ICON) is ecRad. The radiation scheme ecRad is accurate but computationally expensive. It is operationally run in ICON on a grid coarser than the dynamical grid and the time step interval between two calls is significantly larger. This is known to reduce the quality of the climate prediction. A possible approach to accelerate the computation of the radiation fluxes is to use machine learning methods. Machine learning methods can significantly speed up computation of radiation, but they may cause climate drifts if they do not respect essential physical laws. In this work, we study random forest and neural network emulations of ecRad. We study different strategies to compare the stability of the emulations. Concerning the neural network, we compare loss functions with an additional energy penalty term and we observe that modifying the loss function is essential to predict accurately the heating rates. The random forest emulator, which is significantly faster to train than the neural network is used as a reference model that the neural network must outperform. The random forest emulator can become extremely accurate but the memory requirement quickly become prohibitive. Various numerical experiments are performed to illustrate the properties of the machine learning emulators.

How to cite: Bertoli, G., Schemm, S., Ozdemir, F., Perez Cruz, F., and Szekely, E.: Building a physics-constrained, fast and stable machine learning-based radiation emulator, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3418, https://doi.org/10.5194/egusphere-egu23-3418, 2023.

EGU23-3457 | Orals | ITS1.14/CL5.8

Evaluating Vegetation Modelling in Earth System Models with Machine Learning Approaches 

Ranjini Swaminathan, Tristan Quaife, and Richard Allan

The presence and amount of vegetation in any given region controls Gross Primary Production (GPP) or  the flux of carbon into the land driven by the process of photosynthesis. Earth System Models (ESMs) give us the ability to simulate GPP through modelling the various interactions between the atmosphere and biosphere including under different climate change scenarios in the future. GPP is the largest flux of the global carbon cycle and plays an important role including in carbon budget calculations.  However, GPP estimates from ESMs not only vary widely but also have much uncertainty in the underpinning attributors for this variability.  

We use data from pre-industrial Control (pi-Control) simulations to avail of the longer time period to sample data from as well as to exclude the influence of anthropogenic forcing in GPP estimation thereby leaving GPP to be largely attributable to two factor - (a) input atmospheric forcings and (b) the processes using those input climate variables to diagnose GPP. 

We explore the processes determining GPP with a physically-guided Machine Learning framework applied to a set of Earth System Models (ESMs) from the Sixth Coupled Model Intercomparison Project (CMIP6). We use this framework to examine whether differences in GPP across models are caused by differences in atmospheric state or process representations. 

Results from our analysis show that models with similar regional atmospheric forcing do not always have similar GPP distributions. While there are regions where climate models largely agree on what atmospheric variables are most relevant for GPP, there are regions such as the tropics where there is more uncertainty.  Our analysis highlights the potential of ML to identify differences in atmospheric forcing and carbon cycle process modelling across current state-of-the-art ESMs. It also allows us to extend the analysis with observational estimates of forcings as well as GPP for model improvement. 

How to cite: Swaminathan, R., Quaife, T., and Allan, R.: Evaluating Vegetation Modelling in Earth System Models with Machine Learning Approaches, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3457, https://doi.org/10.5194/egusphere-egu23-3457, 2023.

EGU23-3619 | ECS | Posters on site | ITS1.14/CL5.8

TCDetect: A new method of Detecting the Presence of Tropical Cyclones using Deep Learning 

Daniel Galea, Julian Kunkel, and Bryan Lawrence

Tropical cyclones are high-impact weather events which have large human and economic effects, so it is important to be able to understand how their location, frequency and structure might change in a future climate.

Here, a lightweight deep learning model is presented which is intended for detecting the presence of tropical cyclones during the execution of numerical simulations for use in an online data reduction method. This will help to avoid saving vast amounts of data for analysis after the simulation is complete. With run-time detection, it might be possible to reduce the need for some of the high-frequency high-resolution output which would otherwise be required.

The model was trained on ERA-Interim reanalysis data from 1979 to 2017 and the training concentrated on delivering the highest possible recall rate (successful detection of cyclones) while rejecting enough data to make a difference in outputs.

When tested using data from the two subsequent years, the recall or probability of detection rate was 92%. The precision rate or success ratio obtained was that of 36%. For the desired data reduction application, if the desired target included all tropical cyclone events, even those which did not obtain hurricane-strength status, the effective precision was 85%.

The recall rate and the Area Under Curve for the Precision/Recall (AUC-PR) compare favourably with other methods of cyclone identification while using the smallest number of parameters for both training and inference. 

Work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-843612

How to cite: Galea, D., Kunkel, J., and Lawrence, B.: TCDetect: A new method of Detecting the Presence of Tropical Cyclones using Deep Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3619, https://doi.org/10.5194/egusphere-egu23-3619, 2023.

EGU23-3875 | ECS | Posters on site | ITS1.14/CL5.8

Explainable AI for oceanic carbon cycle analysis of CMIP6 

Paul Heubel, Lydia Keppler, and Tatiana Iliyna

The Southern Ocean acts as one of Earth's major carbon sinks, taking up anthropogenic carbon from the atmosphere. Earth System Models (ESMs) are used to project its future evolution. However, the ESMs in the Coupled Model Intercomparison Project version 6 (CMIP6) disagree on the biogeochemical representation of the Southern Ocean carbon cycle, both with respect to the phasing and the magnitude of the seasonal cycle of dissolved inorganic carbon (DIC), and they compare poorly with observations.

We develop a framework to investigate model biases in 10 CMIP6 ESMs historical runs incorporating explainable artificial intelligence (xAI) methodologies. Using both a linear Random Forest feature relevance approach to a nonlinear self organizing map - feed forward neural network (SOM-FFN) framework, we relate 5 drivers of the seasonal cycle of DIC in the Southern Ocean in the different CMIP6 models. We investigate temperature, salinity, silicate, nitrate and dissolved oxygen as potential drivers. This analysis allows us to determine dominant statistical drivers of the seasonal cycle of DIC in the different models, and how they compare to the observations. Our findings inform future model development to better constrain the seasonal cycle of DIC.

How to cite: Heubel, P., Keppler, L., and Iliyna, T.: Explainable AI for oceanic carbon cycle analysis of CMIP6, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3875, https://doi.org/10.5194/egusphere-egu23-3875, 2023.

EGU23-4044 | ECS | Orals | ITS1.14/CL5.8

DailyMelt: Diffusion-based Models for Spatiotemporal Downscaling of (Ant-)arctic Surface Meltwater Maps 

Björn Lütjens, Patrick Alexander, Raf Antwerpen, Guido Cervone, Matthew Kearney, Bingkun Luo, Dava Newman, and Marco Tedesco

Motivation. Ice melting in Greenland and Antarctica has increasingly contributed to rising sea levels. Yet, the exact speed of melting, existence of abrupt tipping points, and in-detail links to climate change remain uncertain. Ice shelves essentially prevent the ice sheet from slipping into the ocean and better prediction of collapses is needed. Meltwater at the surface of ice shelves indicates ice shelf collapse through destabilizing ice shelves via fracturing and flexural processes (Banwell et al., 2013) and is likely impacted by a warming climate ( Kingslake et al., 2017). Maps of meltwater have been created from in-situ and remote observations, but their low and irregular spatiotemporal resolution severely limits studies (Kingslake et al., 2019).

Research Gap. In particular, there does not exist daily high-resolution (< 500m) maps of surface meltwater. We propose the first daily high-resolution surface meltwater maps by developing a deep learning-based downscaling method, called DailyMelt, that fuses observations and simulations of varying spatiotemporal resolution, as illustrated in Fig.1. The created maps will improve understanding of the origin, transport, and controlling physical processes of surface meltwater. Moreover, they will act as unified source to improve sea level rise and meltwater predictions in climate models. 

Data. To synthesize surface meltwater maps, we leverage observations from satellites (MODIS, Sen-1 SAR) which are high-resolution (500m, 10m), but have substantial temporal gaps due to repeat time and cloud coverage. We fuse them with simulations (MAR) and passive microwave observations (MEaSURE) that are daily, but low-resolution (6km, 3.125km). In a significant remote sensing effort, we have downloaded, reprojected, and regridded all products into daily observations for our study area over Greenland’s Helheim glacier. 

Approach and expected results. Within deep generative vision models, diffusion-based models promise sharp and probabilistic predictions. We have implemented SRDiff (Li H. et al., 2022) and tested it on spatially downscaling external data. As a baseline model, we have implemented a statistical downscaling model that is a local hybrid physics-linear regression model (Noel et al., 2016). In our planned benchmark, we expect a baseline UNet architecture that minimizes RMSE to create blurry maps and a generative adversarial network that minimizes adversarial loss to create sharp but deterministic maps. We have started with spatial downscaling and will include temporal downscaling. 

In summary, we will create the first daily high-resolution (500m) surface meltwater maps, have introduced the first diffusion-based model for downscaling Earth sciences data, and have created the first benchmark dataset for downscaling surface meltwater maps.

 

References.

Banwell, A. F., et al. (2013), Breakup of the Larsen B Ice Shelf triggered by chain reaction drainage of supraglacial lakes, Geophys. Res. Lett., 40 

Kingslake J, et al. (2017), Widespread movement of meltwater onto and across Antarctic ice shelves, Nature, 544(7650)

Kingslake J., et al. (2019), Antarctic Surface Hydrology and Ice Shelf Stability Workshop report, US Antarctic Program Data Center

Li H., et al. (2022), SRDiff: Single image super-resolution with diffusion probabilistic models, Neurocomputing, 479

Noël, B., et al. (2016), A daily, 1 km resolution data set of downscaled Greenland ice sheet surface mass balance (1958–2015), The Cryosphere, 10

How to cite: Lütjens, B., Alexander, P., Antwerpen, R., Cervone, G., Kearney, M., Luo, B., Newman, D., and Tedesco, M.: DailyMelt: Diffusion-based Models for Spatiotemporal Downscaling of (Ant-)arctic Surface Meltwater Maps, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4044, https://doi.org/10.5194/egusphere-egu23-4044, 2023.

EGU23-4350 | ECS | Orals | ITS1.14/CL5.8

Physics-Constrained Deep Learning for Downscaling 

Paula Harder, Venkatesh Ramesh, Alex Hernandez-Garcia, Qidong Yang, Prasanna Sattigeri, Daniela Szwarcman, Campbell Watson, and David Rolnick

The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling can provide an efficient method of upsampling low-resolution data. In this field, deep learning has been applied successfully, often using image super-resolution methods from computer vision. However, despite achieving visually compelling results in some cases, such models frequently violate conservation laws when predicting physical variables. In order to conserve physical quantities, we develop methods that guarantee physical constraints are satisfied by a deep learning downscaling model while also improving their performance according to traditional metrics. We compare different constraining approaches and demonstrate their applicability across different neural architectures as well as a variety of climate and weather data sets, including ERA5 and WRF data sets.

How to cite: Harder, P., Ramesh, V., Hernandez-Garcia, A., Yang, Q., Sattigeri, P., Szwarcman, D., Watson, C., and Rolnick, D.: Physics-Constrained Deep Learning for Downscaling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4350, https://doi.org/10.5194/egusphere-egu23-4350, 2023.

EGU23-5431 | ECS | Orals | ITS1.14/CL5.8

Towards Robust Parameterizations in Ecosystem-level Photosynthesis Models 

Shanning Bao, Nuno Carvalhais, Lazaro Alonso, Siyuan Wang, Johannes Gensheimer, Ranit De, and Jiancheng Shi

Photosynthesis model parameters represent vegetation properties or sensitivities of photosynthesis processes. As one of the model uncertainty sources, parameters affect the accuracy and generalizability of the model. Ideally, parameters of ecosystem-level photosynthesis models, i.e., gross primary productivity (GPP) models, can be measured or inversed from observations at the local scale. To extrapolate parameters to a larger spatial scale, current photosynthesis models typically adopted fixed values or plant-functional-type(PFT)-specific values. However, the fixed and PFT-based parameterization approaches cannot capture sufficiently the spatial variability of parameters and lead to significant estimation errors. Here, we propose a Simultaneous Parameter Inversion and Extrapolation approach (SPIE) to overcome these issues. 

SPIE refers to predicting model parameters using an artificial neural network (NN) constrained by both model loss and ecosystem features including PFT, climate types, bioclimatic variables, vegetation features, atmospheric nitrogen and phosphorus deposition and soil properties. Taking a light use efficiency (LUE) model as an example, we evaluated SPIE at 196 FLUXNET eddy covariance flux sites. The LUE model accounts for the effects of air temperature, vapor pressure deficit, soil water availability (SW), light saturation, diffuse radiation fraction and CO2 on GPP using five independent sensitivity functions. The SW was represented using the water availability index and can be optimized based on evapotranspiration. Thus, we optimized the NN by minimizing the model loss which consists of GPP errors, evapotranspiration errors, and constraints on sensitivity functions. Furthermore, we compared SPIE with 11 typical parameter extrapolating approaches, including PFT- and climate-specific parameterizations, global and PFT-based parameter optimization, site-similarity, and regression methods using Nash-Sutcliffe model efficiency (NSE), determination coefficient (R2) and normalized root mean squared error (NRMSE).

The results in ten-fold cross-validation showed that SPIE had the best performance across various temporal and spatial scales and across assessing metrics. None of the parameter extrapolating approaches reached the same performance as the on-site calibrated parameters (NSE=0.95), but SPIE was the only approach showing positive NSE (=0.68) in cross-validation across sites. Moreover, the site-level NSE, R2, and NRMSE of SPIE all significantly outperformed per biome and per climate type. Ranges of parameters were more constrained by SPIE than site calibrations.

Overall, SPIE is a robust parameter extrapolation approach that overcomes strong limitations observed in many of the standard model parameterization approaches. Our approach suggests that model parameterizations can be determined from observations of vegetation, climate and soil properties, and expands from customary clustering methods (e.g., PFT-specific parameterization). We argue that expanding SPIE to other models overcomes current limits in parameterization and serves as an entry point to investigate the robustness and generalization of different models.

How to cite: Bao, S., Carvalhais, N., Alonso, L., Wang, S., Gensheimer, J., De, R., and Shi, J.: Towards Robust Parameterizations in Ecosystem-level Photosynthesis Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5431, https://doi.org/10.5194/egusphere-egu23-5431, 2023.

EGU23-5487 * | ECS | Posters on site | ITS1.14/CL5.8 | Highlight

Harvesting historical spy imagery by evaluating deep learning models for state-wide mapping of land cover changes between 1965-1978 

Lucas Kugler, Christopher Marrs, Eric Kosczor, and Matthias Forkel

Remote sensing has played a fundamental role for land cover mapping and change detection at least since the launch of the Landsat satellite program in 1972. In 1995, the Central Intelligence Agency of the United States of America released previously classified spy imagery taken from 1960 onwards with near-global coverage from the Keyhole programme, which includes the CORONA satellite mission. CORONA imagery is a treasure because it contains information about land cover 10 years before the beginning of the civilian Earth observation and has a high spatial resolution < 2m. However, this imagery is only pan-chromatic and usually not georeferenced, which has so far prevented a large-scale application for land cover mapping or other geophysical and environmental applications.

Here, we aim to harvest the valuable information about past land cover from CORONA imagery for a state-wide mapping of past land cover changes between 1965 and 1978 by training, testing and validating various deep learning models.

To the best of our knowledge, this is the first work to analyse land cover from CORONA data on a large scale, dividing land cover into six classes based on the CORINE classification scheme. The particular focus of the work is to test the transferability of the deep learning approaches to unknown CORONA data.

To investigate the transferability, we selected 27 spatially and temporally distributed study areas (each 23 km²) in the Free State of Saxony (Germany) and created semantic masks to train and test 10 different U-shaped neuronal network architectures to extract land cover from CORONA data. As input, we use either the original panchromatic pixel values and different texture measures. From these input data, ten different training datasets and test datasets were derived for cross-validation.

The training results show that a semantic segmentation of land cover from CORONA data with the used architectures is possible. Strong differences in model performance (based on cross validation and the intersection over union metric, IOU) were detected among the classes. Classes with many sample data achieve significantly better IOU values than underrepresented classes. In general, a U-shaped architecture with a Transformer as Encoder (Transformer U-Net) achieved the best results. The best segmentation performance (IOU 83.29%), was obtained for forests, followed by agriculture (74.21%). For artificial surfaces, a mean IOU of 68.83% was achieved. Water surfaces achieved a mean IOU of 66.49%. For the shrub vegetation and open areas classes only IOU values mostly below 25% were achieved. The deep learning models were successfully transferable in space (between test areas) and time (between CORONA imagery from different years) especially for classes with many sample data. The transferability of deep learning models was difficult for the mapping of water bodies. Despite the general good model performance and successful transferability for most classes, the transferability was limited especially for imagery of very poor quality. Our approach enabled the state-wide mapping of land cover in Saxony between 1965 and 1978 with a spatial resolution of 2 m. We identify an increase in urban cover and a decrease in cropland cover

How to cite: Kugler, L., Marrs, C., Kosczor, E., and Forkel, M.: Harvesting historical spy imagery by evaluating deep learning models for state-wide mapping of land cover changes between 1965-1978, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5487, https://doi.org/10.5194/egusphere-egu23-5487, 2023.

EGU23-5583 | ECS | Posters on site | ITS1.14/CL5.8

Identifying and Locating Volcanic Eruptions using Convolutional Neural Networks and Interpretability Techniques 

Johannes Meuer, Claudia Timmreck, Shih-Wei Fang, and Christopher Kadow

Accurately interpreting past climate variability can be a challenging task, particularly when it comes to distinguishing between forced and unforced changes. In the  case of large volcanic eruptions, ice core records are a very valuable tool but still often not sufficient to link reconstructed anomaly patterns to a volcanic eruption at all or to its geographical location. In this study, we developed a convolutional neural network (CNN) that is able to classify whether a volcanic eruption occurred and its location (northern hemisphere extratropical, southern hemisphere extratropical, or tropics) with an accuracy of 92%.

To train the CNN, we used 100 member ensembles of the MPI-ESM-LR global climate model, generated using the easy volcanic aerosol (EVA) model, which provides the radiative forcing of idealized volcanic eruptions of different strengths and locations. The model considered global sea surface temperature and precipitation patterns 12 months after the eruption over a time period of 3 months.

In addition to demonstrating the high accuracy of the CNN, we also applied layer-wise relevance propagation (LRP) to the model to understand its decision-making process and identify the input data that influenced its predictions. Our study demonstrates the potential of using CNNs and interpretability techniques for identifying and locating past volcanic eruptions as well as improving the accuracy and understanding of volcanic climate signals.

How to cite: Meuer, J., Timmreck, C., Fang, S.-W., and Kadow, C.: Identifying and Locating Volcanic Eruptions using Convolutional Neural Networks and Interpretability Techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5583, https://doi.org/10.5194/egusphere-egu23-5583, 2023.

EGU23-5967 | ECS | Posters on site | ITS1.14/CL5.8

Potentials and challenges of using Explainable AI for understanding atmospheric circulation 

Sebastian Scher, Andreas Trügler, and Jakob Abermann

Machine Learning (ML) and AI techniques, especially methods based on Deep Learning, have long been considered as black boxes that might be good at predicting, but not explaining predictions. This has changed recently, with more techniques becoming available that explain predictions by ML models – known as Explainable AI (XAI). These have seen adaptation also in climate science, because they could have the potential to help us in understanding the physics behind phenomena in geoscience. It is, however, unclear, how large that potential really is, and how these methods can be incorporated into the scientific process. In our study, we use the exemplary research question of which aspects of the large-scale atmospheric circulation affect specific local conditions. We compare the different answers to this question obtained with a range of different methods, from the traditional approach of targeted data analysis based on physical knowledge (such as using dimensionality reduction based on physical reasoning) to purely data-driven and physics-unaware methods using Deep Learning with XAI techniques. Based on these insights, we discuss the usefulness and potential pitfalls of XAI for understanding and explaining phenomena in geosciences. 

How to cite: Scher, S., Trügler, A., and Abermann, J.: Potentials and challenges of using Explainable AI for understanding atmospheric circulation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5967, https://doi.org/10.5194/egusphere-egu23-5967, 2023.

EGU23-6061 | ECS | Orals | ITS1.14/CL5.8 | Highlight

Using reduced representations of atmospheric fields to quantify the causal drivers of air pollution 

Sebastian Hickman, Paul Griffiths, Peer Nowack, and Alex Archibald

Air pollution contributes to millions of deaths worldwide every year. The concentration of a particular air pollutant, such as ozone, is controlled by physical and chemical processes which act on varying temporal and spatial scales. Quantifying the strength of causal drivers (e.g. temperature) on air pollution from observational data, particularly at extrema, is challenging due to the difficulty of disentangling correlation and causation, as many drivers are correlated. Furthermore, because air pollution is controlled in part by large scale atmospheric phenomena, using local (e.g. individual grid cell level) covariates for analysis is insufficient to fully capture the effect of these phenomena on air pollution. 

 

Access to large spatiotemporal datasets of air pollutant concentrations and atmospheric variables, coupled with recent advances in self-supervised learning, allow us to learn reduced representations of spatiotemporal atmospheric fields, and therefore account for non-local and non-instantaneous processes in downstream tasks.

 

We show that these learned reduced representations can be useful for tasks such as air pollution forecasting, and crucially to quantify the causal effect of varying atmospheric fields on air pollution. We make use of recent advances in bounding causal effects in the presence of unobserved confounding to estimate, with uncertainty, the causal effect of changing atmospheric fields on air pollution. Finally, we compare our quantification of the causal drivers of air pollution to results from other approaches, and explore implications for our methods and for the wider goal of improving the process-level treatment of air pollutants in chemistry-climate models.

How to cite: Hickman, S., Griffiths, P., Nowack, P., and Archibald, A.: Using reduced representations of atmospheric fields to quantify the causal drivers of air pollution, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6061, https://doi.org/10.5194/egusphere-egu23-6061, 2023.

EGU23-6306 | ECS | Orals | ITS1.14/CL5.8 | Highlight

Data-Driven Cloud Cover Parameterizations 

Arthur Grundner, Tom Beucler, Pierre Gentine, Marco A. Giorgetta, Fernando Iglesias-Suarez, and Veronika Eyring

A promising approach to improve cloud parameterizations within climate models, and thus climate projections, is to train machine learning algorithms on storm-resolving model (SRM) output. The ICOsahedral Non-hydrostatic (ICON) modeling framework permits simulations ranging from numerical weather prediction to climate projections, making it an ideal target to develop data-driven parameterizations for sub-grid scale processes. Here, we systematically derive and evaluate the first data-driven cloud cover parameterizations with coarse-grained data based on ICON SRM simulations. These parameterizations range from simple analytic models and symbolic regression fits to neural networks (NNs), populating a performance x complexity plane. In most models, we enforce sparsity and discourage correlated features by sequentially selecting features based on the models' performance gains. Guided by a set of physical constraints, we use symbolic regression to find a novel equation to parameterize cloud cover. The equation represents a good compromise between performance and complexity, achieving the highest performance (R^2>0.9) for its complexity (13 trainable parameters). To model sub-grid scale cloud cover in its full complexity, we also develop three different types of NNs that differ in the degree of vertical locality they assume for diagnosing cloud cover from coarse-grained atmospheric state variables. Using the game-theory based interpretability library SHapley Additive exPlanations, we analyze our most non-local NN and identify an overemphasis on specific humidity and cloud ice as the reason why it cannot perfectly generalize from the global to the regional coarse-grained SRM data. The interpretability tool also helps visualize similarities and differences in feature importance between regionally and globally trained NNs, and reveals a local relationship between their cloud cover predictions and the thermodynamic environment. Our results show the potential of deep learning and symbolic regression to derive accurate yet interpretable cloud cover parameterizations from SRMs.

How to cite: Grundner, A., Beucler, T., Gentine, P., Giorgetta, M. A., Iglesias-Suarez, F., and Eyring, V.: Data-Driven Cloud Cover Parameterizations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6306, https://doi.org/10.5194/egusphere-egu23-6306, 2023.

EGU23-6450 | ECS | Orals | ITS1.14/CL5.8

The key role of causal discovery to improve data-driven parameterizations in climate models 

Fernando Iglesias-Suarez, Veronika Eyring, Pierre Gentine, Tom Beucler, Michael Pritchard, Jakob Runge, and Breixo Solino-Fernandez

Earth system models are fundamental to understanding and projecting climate change, although there are considerable biases and uncertainties in their projections. A large contribution to this uncertainty stems from differences in the representation of clouds and convection occurring at scales smaller than the resolved model grid. These long-standing deficiencies in cloud parameterizations have motivated developments of computationally costly global high-resolution cloud resolving models, that can explicitly resolve clouds and convection. Deep learning can learn such explicitly resolved processes from cloud resolving models. While unconstrained neural networks often learn non-physical relationships that can lead to instabilities in climate simulations, causally-informed deep learning can mitigate this problem by identifying direct physical drivers of subgrid-scale processes. Both unconstrained and causally-informed neural networks are developed using a superparameterized climate model in which deep convection is explicitly resolved, and are coupled to the climate model. Prognostic climate simulations with causally-informed neural network parameterization are stable, accurately represent mean climate and variability of the original climate model, and clearly outperform its non-causal counterpart. Combining causal discovery and deep learning is a promising approach to improve data-driven parameterizations (informed by causally-consistent physical fields) for both their design and trustworthiness.

How to cite: Iglesias-Suarez, F., Eyring, V., Gentine, P., Beucler, T., Pritchard, M., Runge, J., and Solino-Fernandez, B.: The key role of causal discovery to improve data-driven parameterizations in climate models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6450, https://doi.org/10.5194/egusphere-egu23-6450, 2023.

EGU23-7457 | ECS | Posters on site | ITS1.14/CL5.8

Towards the effective autoencoder architecture to detect weather anomalies 

Dusan Fister, Jorge Pérez-Aracil, César Peláez-Rodríguez, Marie Drouard, Pablo G. Zaninelli, David Barriopedro Cepero, Ricardo García-Herrera, and Sancho Salcedo-Sanz

To organise weather data as images, pixels represent coordinates and magnitude of pixels represents the state of the observed variable in a given time. Observed variables, such as air temperature, mean sea level pressure, wind components and others, may be collected into higher dimensional images or even into a motion structure. Codification of formers as a spatial and the latter as a spatio-temporal allows them to be processed using the deep learning methods, for instance autoencoders and autoencoder-like architectures. The objective of the original autoencoder is to reproduce the input image as much as possible, thus effectively equalising the input and output during the training. Then, an advantage of autoencoder can be utilised to calculate the deviations between (1) true states (effectively the inputs), which are derived by nature, and the (2) expected states, which are derived by means of statistical learning. Calculated deviations can then be interpreted to identify the extreme events, such as heatwaves, hot days or any other rare events (so-called anomalies). Additionally, by modelling deviations by statistical distributions, geographical areas with higher probabilities of anomalies can be deduced at the tails of the distribution. The capability of reproduction of the (original input) images is hence crucial in order to avoid addressing arbitrary noise as anomaly. We would like to run experiments to realise the effective architecture that give reasonable solutions, verify the benefits of implementing the variational autoencoder, realise the effect of selecting various statistical loss functions, and find out the effective architecture of the decoder part of the autoencoder.

How to cite: Fister, D., Pérez-Aracil, J., Peláez-Rodríguez, C., Drouard, M., G. Zaninelli, P., Barriopedro Cepero, D., García-Herrera, R., and Salcedo-Sanz, S.: Towards the effective autoencoder architecture to detect weather anomalies, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7457, https://doi.org/10.5194/egusphere-egu23-7457, 2023.

EGU23-7465 | ECS | Posters on site | ITS1.14/CL5.8

Invertible neural networks for satellite retrievals of aerosol optical depth 

Paolo Pelucchi, Jorge Vicent, J. Emmanuel Johnson, Philip Stier, and Gustau Camps-Valls

The retrieval of atmospheric aerosol properties from satellite remote sensing is a complex and under-determined inverse problem. Traditional retrieval algorithms, based on radiative transfer models, must make approximations and assumptions to reach a unique solution or repeatedly use the expensive forward models to be able to quantify uncertainty. The recently introduced Invertible Neural Networks (INNs), a machine learning method based on Normalizing Flows, appear particularly suited for tackling inverse problems. They simultaneously model both the forward and the inverse branches of the problem, and their generative aspect allows them to efficiently provide non-parametric posterior distributions for the retrieved parameters, which can be used to quantify the retrieval uncertainty. So far INNs have successfully been applied to low-dimensional idealised inverse problems and even to some simpler scientific retrieval problems. Still, satellite aerosol retrievals present particular challenges, such as the high variability of the surface reflectance signal and the often comparatively small aerosol signal in the top-of-the-atmosphere (TOA) measurements.

In this study, we investigate the use of INNs for retrieving aerosol optical depth (AOD) and its uncertainty estimates at the pixel level from MODIS TOA reflectance measurements. The models are trained with custom synthetic datasets of TOA reflectance-AOD pairs made by combining the MODIS Dark Target algorithm’s atmospheric look-up tables and a MODIS surface reflectance product. The INNs are found to perform emulation and inversion of the look-up tables successfully. We initially train models adapted to different surface types by focusing our application on limited regional and seasonal contexts. The models are applied to real measurements from the MODIS sensor, and the generated AOD retrievals and posterior distributions are compared to the corresponding Dark Target and AERONET retrievals for evaluation and discussion.

How to cite: Pelucchi, P., Vicent, J., Johnson, J. E., Stier, P., and Camps-Valls, G.: Invertible neural networks for satellite retrievals of aerosol optical depth, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7465, https://doi.org/10.5194/egusphere-egu23-7465, 2023.

The rapid development of deep learning approaches has conquered many fields, and precipitation prediction is one of them. Precipitation modeling remains a challenge for numerical weather prediction or climate models, and parameterization is required for low spatial resolution models, such as those used in climate change impact studies. Machine learning models have been shown to be capable of learning the relationships between other meteorological variables and precipitation. Such models are much less computationally intensive than explicit modeling of precipitation processes and are becoming more accurate than parametrization schemes.

Most existing applications focus either on precipitation extremes aggregated over a domain of interest or on average precipitation fields. Here, we are interested in spatial extremes and focus on the prediction of heavy precipitation events (>95th percentile) and extreme events (>99th percentile) over the European domain. Meteorological variables from ERA5 are used as input, and E-OBS data as target. Different architectures from the literature are compared in terms of predictive skill for average precipitation fields as well as for the occurrence of heavy or extreme precipitation events (threshold exceedance). U-Net architectures show higher skills than other variants of convolutional neural networks (CNN). We also show that a shallower U-Net architecture performs as well as the original network for this application, thus reducing the model complexity and, consequently, the computational resources. In addition, we analyze the number of inputs based on the importance of the predictors provided by a layer-wise relevance propagation procedure.

How to cite: Horton, P. and Otero, N.: Predicting spatial precipitation extremes with deep learning models. A comparison of existing model architectures., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7862, https://doi.org/10.5194/egusphere-egu23-7862, 2023.

EGU23-8085 | ECS | Posters on site | ITS1.14/CL5.8

Improving the spatial accuracy of extreme tropical cyclone rainfall in ERA5 using deep learning 

Guido Ascenso, Andrea Ficchì, Leone Cavicchia, Enrico Scoccimarro, Matteo Giuliani, and Andrea Castelletti

Tropical cyclones (TCs) are one of the costliest and deadliest natural disasters due to the combination of their strong winds and induced storm surges and heavy precipitation, which can cause devastating floods. Unfortunately, due to its high spatio-temporal variability, complex underlying physical process, and lack of high-quality observations, precipitation is still one of the most challenging aspects of a TC to model. However, as precipitation is a key forcing variable for hydrological processes acting across multiple space-time scales, accurate precipitation input is crucial for reliable hydrological simulations and forecasts.

A popular source of precipitation data is the ERA5 reanalysis dataset, frequently used as input to hydrological models when studying floods. However, ERA5 systematically underestimates TC-induced precipitation compared to MSWEP, a multi-source observational dataset fusing gauge, satellite, and reanalysis-based data, currently one of the most accurate precipitation datasets. Moreover, the spatial distribution of TC-rainfall in ERA5 has large room for improvement.

Here, we present a precipitation correction scheme based on U-Net, a popular deep-learning architecture. Rather than only adjusting the per-pixel precipitation values at each timestep of a given TC, we explicitly design our model to also adjust the spatial distribution of the precipitation; to the best of our knowledge, we are the first to do so. The key novelty of our model is a custom-made loss function, based on the combination of the fractions skill score (FSS) and mean absolute error (MAE) metrics. We train and validate the model on 100k time steps (with an 80:20 train:test split) from global TC precipitation events. We show how a U-Net trained with our loss function can reduce the per-pixel MAE of ERA5 precipitation by nearly as much as other state-of-the-art methods, while surpassing them significantly in terms of improved spatial patterns of precipitation. Finally, we discuss how the outputs of our model can be used for future research.

How to cite: Ascenso, G., Ficchì, A., Cavicchia, L., Scoccimarro, E., Giuliani, M., and Castelletti, A.: Improving the spatial accuracy of extreme tropical cyclone rainfall in ERA5 using deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8085, https://doi.org/10.5194/egusphere-egu23-8085, 2023.

EGU23-8496 | ECS | Posters on site | ITS1.14/CL5.8

Utilizing AI emulators to Model Stratospheric Aerosol Injections and their Effect on Climate 

Eshaan Agrawal and Christian Schroder de Witt

With no end to anthropogenic greenhouse gas emissions in sight, policymakers are increasingly debating artificial mechanisms to cool the earth's climate. One such solution is stratospheric atmospheric injections (SAI), a method of solar geoengineering where particles are injected into the stratosphere in order to reflect the sun’s rays and lower global temperatures. Past volcanic events suggest that SAI can lead to fast substantial surface temperature reductions, and it is projected to be economically feasible. Research in simulation, however, suggests that SAI can lead to catastrophic side effects. It is also controversial among politicians and environmentalists because of the numerous challenges it poses geopolitically, environmentally, and for human health. Nevertheless, SAI is increasingly receiving attention from policymakers. In this research project, we use deep reinforcement learning to study if, and by how much, carefully engineered temporally and spatially varying injection strategies can mitigate catastrophic side effects of SAI. To do this, we are using the HadCM3 global circulation model to collect climate system data in response to artificial longitudinal aerosol injections. We then train a neural network emulator on this data, and use it to learn optimal injection strategies under a variety of objectives by alternating model updates with reinforcement learning. We release our dataset and code as a benchmark dataset to improve emulator creation for solar aerosol engineering modeling. 

How to cite: Agrawal, E. and Schroder de Witt, C.: Utilizing AI emulators to Model Stratospheric Aerosol Injections and their Effect on Climate, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8496, https://doi.org/10.5194/egusphere-egu23-8496, 2023.

Multiple studies have now demonstrated that machine learning (ML) can give improved skill for simulating fairly typical weather events in climate simulations, for tasks such as downscaling to higher resolution and emulating and speeding up expensive model parameterisations. Many of these used ML methods with very high numbers of parameters, such as neural networks, which are the focus of the discussion here. Not much attention has been given to the performance of these methods for extreme event severities of relevance for many critical weather and climate prediction applications, with return periods of more than a few years. This leaves a lot of uncertainty about the usefulness of these methods, particularly for general purpose models that must perform reliably in extreme situations. ML models may be expected to struggle to predict extremes due to there usually being few samples of such events. 
 
This presentation will review the small number of studies that have examined the skill of machine learning methods in extreme weather situations. It will be shown using recent results that machine learning methods that perform reasonably for typical weather events can have very large errors in extreme situations, highlighting the necessity of testing the performance for these cases. Extrapolation to extremes is found to work well in some studies, however. 
 
It will be argued that more attention needs to be given to performance for extremes in work applying ML in climate science. Research gaps that seem particularly important are identified. These include investigating the behaviour of ML systems in events that are multiple standard deviations beyond observed records, which have occurred in the past, and evaluating performance of complex generative models in extreme events. Approaches to address these problems will be discussed.

How to cite: Watson, P.: Machine learning applications for weather and climate need greater focus on extremes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8615, https://doi.org/10.5194/egusphere-egu23-8615, 2023.

EGU23-8661 | Posters on site | ITS1.14/CL5.8

An urban climate neural network screening tool 

Robert von Tils and Sven Wiemers

Microscale RANS (Reynolds Averaged Navier Stokes) models are able to simulate the urban climate for entire large cities with a high spatial resolution of up to 5 m horizontally. They do this using data from geographic information systems (GIS) that must be specially processed to provide the models with information about the terrain, buildings, land use, and resolved vegetation. If high-performance computers, for example from research institutions, are not available for the simulations or are beyond the financial scope, the calculation on commercially available servers can take several weeks. The calculation of a reference initial state for a city is often followed by questions regarding adaptation measures due to climate change or the influence of smaller and larger future building developments on the urban climate. These changes lead locally to a change of the urban climate but are also influenced by the urban climate itself.

In order to save computational time and to comfortably give a quantitative fast initial assessment, we trained a neural network that predicts the simulation results of a RANS model (for example: air temperature at night and during the day, wind speed, cold air flow) and implemented this network in a GIS. The tool allows to calculate the impact of development projects on the urban climate in a fraction of the time required by a RANS simulation and comes close to the RANS model in terms of accuracy. It can also be used by people without in-depth knowledge of urban climate modeling and is therefore particularly suitable for use, for example, in specialized offices of administrative departments or by project developers.

How to cite: von Tils, R. and Wiemers, S.: An urban climate neural network screening tool, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8661, https://doi.org/10.5194/egusphere-egu23-8661, 2023.

EGU23-8666 | ECS | Posters on site | ITS1.14/CL5.8

Drivers of Natural Gas Use in United States Buildings 

Rohith Teja Mittakola, Philippe Ciais, Jochen Schubert, David Makowski, Chuanlong Zhou, Hassan Bazzi, Taochun Sun, Zhu Liu, and Steven Davis

Natural gas is the primary fuel used in U.S. residences, especially during winter, when cold temperatures drive the heating demand. In this study, we use daily county-level gas consumption data to assess the spatial patterns of the relationships and sensitivities of gas consumption by U.S. households considering outdoor temperatures. Linear-plus-plateau functions are found to be the best fit for gas consumption and are applied to derive two key coefficients for each county: the heating temperature threshold (Tcrit) below which residential heating starts and the rate of increase in gas consumption when the outdoor temperature drops by one degree (Slope). We then use interpretable machine learning models to evaluate the key building properties and socioeconomic factors related to the spatial patterns of Tcrit and Slope based on a large database of individual household properties and population census data. We find that building age, employment rates, and household size are the main predictors of Tcrit, whereas the share of gas as a heating fuel and household income are the main predictors of Slope. The latter result suggests inequalities across the U.S. with respect to gas consumption, with wealthy people living in well-insulated houses associated with low Tcrit and Slope values. Finally, we estimate potential reductions in gas use in U.S. residences due to improvements in household insulation or a hypothetical behavioral change toward reduced consumption by adopting a 1°C lower Tcrit than the current value and a reduced slope. These two scenarios would result in 25% lower gas consumption at the national scale, avoiding 1.24 million MtCO2 of emissions per year. Most of these reductions occur in the Midwest and East Coast regions. The results from this study provide new quantitative information for targeting efforts to reduce household gas use and related CO2 emissions in the U.S.

How to cite: Mittakola, R. T., Ciais, P., Schubert, J., Makowski, D., Zhou, C., Bazzi, H., Sun, T., Liu, Z., and Davis, S.: Drivers of Natural Gas Use in United States Buildings, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8666, https://doi.org/10.5194/egusphere-egu23-8666, 2023.

EGU23-8921 | ECS | Posters on site | ITS1.14/CL5.8

Identification of sensitive regions to climate change and anticipation of climate events in Brazil 

Angelica Caseri and Francisco A. Rodrigues

In Brazil, the water system is essential for the electrical system and agribusiness. Understanding climate changes and predicting long-term hydrometeorological phenomena is vital for developing and maintaining these sectors in the country. This work aims to use data from the SIN system (National Interconnected System) in Brazil, from the main hydrological basins, as well as historical rainfall data, in complex networks and deep learning algorithms, to identify possible climate changes in Brazil and predict future hydrometeorological phenomena. Through the methodology developed in this work, the predictions generated showed satisfactory results, which allows identifying regions more sensitive to climate change and anticipating climate events. This work is expected to help the energy generation system in Brazil and the agronomy sector, the main sectors that drive the country's economy.

How to cite: Caseri, A. and A. Rodrigues, F.: Identification of sensitive regions to climate change and anticipation of climate events in Brazil, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8921, https://doi.org/10.5194/egusphere-egu23-8921, 2023.

EGU23-9337 | ECS | Posters on site | ITS1.14/CL5.8

Modeling landscape-scale vegetation response to climate: Synthesis of the EarthNet challenge 

Vitus Benson, Christian Requena-Mesa, Claire Robin, Lazaro Alonso, Nuno Carvalhais, and Markus Reichstein

The biosphere displays high heterogeneity at landscape-scale. Vegetation modelers struggle to represent this variability in process-based models because global observations of micrometeorology and plant traits are not available at such fine granularity. However, remote sensing data is available: the Sentinel 2 satellites with a 10m resolution capture aspects of localized vegetation dynamics. The EarthNet challenge (EarthNet2021, [1]) aims at predicting satellite imagery conditioned on coarse-scale weather data. Multiple research groups approached this challenge with deep learning [2,3,4]. Here, we evaluate how well these satellite image models simulate the vegetation response to climate, where the vegetation status is approximated by the NDVI vegetation index.

Achieving the new vegetation-centric evaluation requires three steps. First, we update the original EarthNet2021 dataset to be suitable for vegetation modeling: EarthNet2021x includes improved georeferencing, a land cover map, and a more effective cloud mask. Second, we introduce the interpretable evaluation metric VegetationScore: the Nash Sutcliffe model efficiency (NSE) of NDVI predictions over clear-sky observations per vegetated pixel aggregated through normalization to dataset level. The ground truth NDVI time series achieves a VegetationScore of 1, the target period mean NDVI a VegetationScore of 0. Third, we assess the skill of two deep neural networks with the VegetationScore: ConvLSTM [2,3], which combines convolutions and recurrency, and EarthFormer [4], a Transformer adaptation for Earth science problems. 

Both models significantly outperform the persistence baseline. They do not display systematic biases and generally catch spatial patterns. Yet, both neural networks achieve a negative VegetationScore. Only in about 20% of vegetated pixels, the deep learning models do beat a hypothetical model predicting the true target period mean NDVI. This is partly because models largely underestimate the temporal variability. However, the target variability may partially be inflated by the noisy nature of the observed NDVI. Additionally, increasing uncertainty for longer lead times decreases scores: the mean RMSE in the first 25 days is 50% lower than between 75 and 100 days lead time. In general, consistent with the EarthNet2021 leaderboard, the EarthFormer outperforms the ConvLSTM. With EarthNet2021x, a more narrow perspective to the EarthNet challenge is introduced. Modeling localized vegetation response is a task that requires careful adjustments of off-the-shelf computer vision architectures for them to excel. The resulting specialized approaches can then be used to advance our understanding of the complex interactions between vegetation and climate.



 [1] Requena-Mesa, Benson, Reichstein, Runge and Denzler. EarthNet2021: A large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task. CVPR Workshops, 2021.

 [2] Diaconu, Saha, Günnemann and Zhu. Understanding the Role of Weather Data for Earth Surface Forecasting Using a ConvLSTM-Based Model. CVPR Workshops, 2022.

 [3] Kladny, Milanta, Mraz, Hufkens and Stocker. Deep learning for satellite image forecasting of vegetation greenness. bioRxiv, 2022.

 [4] Gao, Shi, Wang, Zhu, Wang, Li and Yeung. Earthformer: Exploring Space-Time Transformers for Earth System Forecasting. NeurIPS, 2022.

How to cite: Benson, V., Requena-Mesa, C., Robin, C., Alonso, L., Carvalhais, N., and Reichstein, M.: Modeling landscape-scale vegetation response to climate: Synthesis of the EarthNet challenge, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9337, https://doi.org/10.5194/egusphere-egu23-9337, 2023.

EGU23-9434 | ECS | Posters on site | ITS1.14/CL5.8

Enhancing environmental sensor data quality control with graph neural networks 

Elżbieta Lasota, Julius Polz, Christian Chwala, Lennart Schmidt, Peter Lünenschloß, David Schäfer, and Jan Bumberger

The rapidly growing number of low-cost environmental sensors and data from opportunistic sensors constantly advances the quality as well as the spatial and temporal resolution of weather and climate models. However, it also leads to the need for effective tools to ensure the quality of collected data.

Time series quality control (QC) from multiple spatial, irregularly distributed sensors is a challenging task, as it requires the simultaneous integration and analysis of observations from sparse neighboring sensors and consecutive time steps. Manual QC is very often time- and labour- expensive and requires expert knowledge, which introduces subjectivity and limits reproducibility. Therefore, automatic, accurate, and robust QC solutions are in high demand, where among them one can distinguish machine learning techniques. 

In this study, we present a novel approach for the quality control of time series data from multiple spatial, irregularly distributed sensors using graph neural networks (GNNs). Although we applied our method to commercial microwave link attenuation data collected from a network in Germany between April and October 2021, our solution aims to be generic with respect to the number and type of sensors, The proposed approach involves the use of an autoencoder architecture, where the GNN is used to model the spatial relationships between the sensors, allowing for the incorporation of contextual information in the quality control process. 

While our model shows promising results in initial tests, further research is needed to fully evaluate its effectiveness and to demonstrate its potential in a wider range of environmental applications. Eventually, our solution will allow us to further foster the observational basis of our understanding of the natural environment.

How to cite: Lasota, E., Polz, J., Chwala, C., Schmidt, L., Lünenschloß, P., Schäfer, D., and Bumberger, J.: Enhancing environmental sensor data quality control with graph neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9434, https://doi.org/10.5194/egusphere-egu23-9434, 2023.

EGU23-9810 | ECS | Orals | ITS1.14/CL5.8

Integration of a deep-learning-based fire model into a global land surface model 

Rackhun Son, Nuno Carvalhais, Lazaro Silva, Christian Requena-Mesa, Ulrich Weber, Veronika Gayler, Tobias Stacke, Reiner Schnur, Julia Nabel, Alexander Winkler, and Sönke Zaehle

Fire is an ubiquitous process within the Earth system that has significant impacts in terrestrial ecosystems. Process-based fire models quantify fire disturbance effects in stand-alone dynamic global vegetation models (DGVMs) and within coupled Earth system models (ESMs), and their advances have incorporated both descriptions of natural processes and anthropogenic drivers. However, we still observe a limited skill in modeling and predicting fire at global scale, mostly due to the stochastic nature of fire, but also due to the limits in empirical parameterizations in these process-based models. As an alternative, statistical approaches have shown the advantages of machine learning in providing robust diagnostics of fire damages, though with limited value for process-based modeling frameworks. Here, we develop a deep-learning-based fire model (DL-fire) to estimate gridded burned area fraction at global scale and couple it within JSBACH4, the land surface model used in the ICON ESM. We compare the resulting hybrid model integrating DL-fire into JSBACH4 (JDL-fire) against the standard fire model within JSBACH4 and the stand-alone DL-fire results. The stand-alone DL-fire model forced with observations shows high performance in simulating global burnt fraction, showing a monthly correlation (Rm) with the Global Fire Emissions Database (GFED4) of 0.78 and of 0.8 at global scale during the training (2004-10) and validation periods (2011-15), respectively. The performance remains nearly the same when evaluating the hybrid modeling approach JDL-fire (Rm=0.76 and 0.86 in training and evaluation periods, respectively). This outperforms the currently used standard fire model in JSBACH4 (Rm=-0.16 and 0.22 in training and evaluation periods, respectively) by far. We further evaluate the modeling results across specific fire regions and apply layer-wise relevance propagation (LRP) to quantify importance of each predictor. Overall, land properties, such as fuel amount and water contents in soil layers, stand out as the major factors determining burnt fraction in DL-fire, paralleled by meteorological conditions, over tropical and high latitude regions. Our study demonstrates the potential of hybrid modeling in advancing the predictability of Earth system processes by integrating statistical learning approaches in physics-based dynamical systems.

How to cite: Son, R., Carvalhais, N., Silva, L., Requena-Mesa, C., Weber, U., Gayler, V., Stacke, T., Schnur, R., Nabel, J., Winkler, A., and Zaehle, S.: Integration of a deep-learning-based fire model into a global land surface model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9810, https://doi.org/10.5194/egusphere-egu23-9810, 2023.

EGU23-10219 | ECS | Posters on site | ITS1.14/CL5.8

Identifying compound weather prototypes of forest mortality with β-VAE 

Mohit Anand, Friedrich Bohn, Lily-belle Sweet, Gustau Camps-Valls, and Jakob Zscheischler

Forest health is affected by many interacting and correlated weather variables over multiple temporal scales. Climate change affects weather conditions and their dependencies. To better understand future forest health and status, an improved scientific  understanding of the complex relationships between weather conditions and forest mortality is required. Explainable AI (XAI) methods are increasingly used to understand and simulate physical processes in complex environments given enough data. In this work, an hourly weather generator (AWE-GEN) is used  to simulate 200,000 years of daily weather conditions representative of central Germany. It is capable of simulating low and high-frequency characteristics of weather variables and also captures the inter-annual variability of precipitation. These data are then used to drive an individual-based forest model (FORMIND) to simulate the dynamics of a beech, pine, and spruce forest. A variational autoencoder β-VAE is used to learn representations of the generated weather conditions, which include radiation, precipitation and temperature. We learn shared and specific variable latent representations using a decoder network which remains the same for all the weather variables. The representation learning is completely unsupervised. Using the output of the forest model, we identify single and compounding weather prototypes that are associated with extreme forest mortality. We find that the prototypes associated with extreme mortality are similar for pine and spruce forests and slightly different for beech forests. Furthermore, although the compounding weather prototypes represent a larger sample size (2.4%-3.5%) than the single prototypes (1.7%-2.2%), they are associated with higher levels of mortality on average. Overall, our research illustrates how deep learning frameworks can be used to identify weather patterns that are associated with extreme impacts.

 

How to cite: Anand, M., Bohn, F., Sweet, L., Camps-Valls, G., and Zscheischler, J.: Identifying compound weather prototypes of forest mortality with β-VAE, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10219, https://doi.org/10.5194/egusphere-egu23-10219, 2023.

Hydrological models and machine learning models are widely used in streamflow simulation and data reconstruction. However, a global assessment of these models is still lacking and no synthesized catchment-scale streamflow product derived from multiple models is available over the globe. In this study, we comprehensively evaluated four conceptual hydrological models (GR2M, XAJ, SAC, Alpine) and four machine learning models (RF, GBDT, DNN, CNN) based on the selected 16,218 gauging stations worldwide, and then applied multi-model weighting ensemble (MWE) method to merge streamflow simulated from these models. Generally, the average performance of the machine learning model for all stations is better than that of the hydrological model, and with more stations having a quantified simulation accuracy (KGE>0.2); However, the hydrological model achieves a higher percentage of stations with a good simulation accuracy (KGE>0.6). Specifically, for the average accuracy during the validation period, there are 67% (27%) and 74% (21%) of stations showed a “quantified” (“good”) level for the hydrological models and machine learning models, respectively. The XAJ is the best-performing model of the four hydrological models, particularly in tropical and temperate zones. Among the machine learning models, the GBDT model shows better performance on the global scale. The MWE can effectively improve the simulation accuracy and perform much better than the traditional multi-model arithmetic ensemble (MAE), especially for the constrained least squares prediction combination method (CLS) with 82% (28%) of the stations having a “qualified” (“good”) accuracy. Furthermore, by exploring the influencing factors of the streamflow simulation, we found that both machine-learning models and hydrological models perform better in wetter areas.

How to cite: Zhang, J. and Liu, J.: Simulation and reconstruction of global monthly runoff based on hydrological models and machine learning models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10391, https://doi.org/10.5194/egusphere-egu23-10391, 2023.

Physical process-based numerical prediction models (NWPs) and radar-based probabilistic methods have been mainly used for short-term precipitation prediction. Recently, radar-based precipitation nowcasting models using advanced machine learning (ML) have been actively developed. Although the ML-based model shows outstanding performance in short-term rainfall prediction, it significantly decreases performance due to increased lead time. It has the limitation of being a black box model that does not consider the physical process of the atmosphere. To address these limitations, we aimed to develop a hybrid precipitation nowcasting model, which combines NWP and an advanced ML-based model via an ML-based ensemble method. The Weather Research and Forecasting (WRF) model was used as NWP to generate a physics-based rainfall forecast. In this study, we developed the ML-based precipitation nowcasting model with conditional Generative Adversarial Network (cGAN), which shows high performance in the image generation tasks. The radar reflectivity data, WRF hindcast meteorological outputs (e.g., temperature and wind speed), and static information of the target basin (e.g., DEM, Land cover) were used as input data of cGAN-based model to generate physics-informed rainfall prediction at the lead time up to 6 hours. The cGAN-based model was trained with the data for the summer season of 2014-2017. In addition, we proposed an ML-based blending method, i.e., XGBoost, that combines cGAN-based model results and WRF forecast results. To evaluate the hybrid model performance, we analyzed the performance of precipitation predictions on three heavy rain events in South Korea. The results confirmed that using the blending method to develop a hybrid model could provide an improved precipitation nowcasting approach.

 

Acknowledgements

 This work was supported by a grant from the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (2020R1A2C2007670).

How to cite: Choi, S. and Kim, Y.: Developing hybrid precipitation nowcasting model with WRF and conditional GAN-based model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10431, https://doi.org/10.5194/egusphere-egu23-10431, 2023.

EGU23-10568 | ECS | Orals | ITS1.14/CL5.8

Extended-range predictability of stratospheric extreme events using explainable neural networks 

Zheng Wu, Tom Beucler, and Daniela Domeisen

Extreme stratospheric events such as extremely weak vortex events and strong vortex events can influence weather in the troposphere from weeks to months and thus are important sources of predictability of tropospheric weather on subseasonal to seasonal (S2S) timescales. However, the predictability of weak vortex events is limited to 1-2 weeks in state-of-the-art forecasting systems, while strong vortex events are more predictable than weak vortex events. Longer predictability timescales of the stratospheric extreme events would benefit long-range surface weather prediction. Recent studies showed promising results in the use of machine learning for improving weather prediction. The goal of this study is to explore the potential of a machine learning approach in extending the predictability of stratospheric extreme events in S2S timescales. We use neural networks (NNs) to predict the monthly stratospheric polar vortex strength with lead times up to five months using the first five principal components (PCs) of the sea surface temperature (SST), mean sea level pressure (MSLP), Barents–Kara sea-ice concentration (BK-SIC), poleward heat flux at 100 hPa, and zonal wind at 50, 30, and 2 hPa as precursors. These physical variables are chosen as they are indicated as potential precursors for the stratospheric extremes in previous studies. The results show that the accuracy and Brier Skill Score decrease with longer lead times and the performance is similar between weak and strong vortex events. We then employ two different NN attribution methods to uncover feature importance (heat map) in the inputs for the NNs, which indicates the relevance of each input for NNs to make the prediction. The heat maps suggest that precursors from the lower stratosphere are important for the prediction of the stratospheric polar vortex strength with a lead time of one month while the precursors at the surface and the upper stratosphere become more important with lead times longer than one month. This result is overall consistent with the previous studies that subseasonal precursors to the stratospheric extreme events may come from the lower troposphere. Our study sheds light on the potential of explainable NNs in searching for opportunities for skillful prediction of stratospheric extreme events and, by extension, surface weather on S2S timescales.

How to cite: Wu, Z., Beucler, T., and Domeisen, D.: Extended-range predictability of stratospheric extreme events using explainable neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10568, https://doi.org/10.5194/egusphere-egu23-10568, 2023.

One of the main challenges for forecasting fire activity is the tradeoff between accuracy at finer spatial scales relevant to local decision making and predictability over seasonal (next 2-4 months) and subseasonal-to-seasonal (next 2 weeks to 2 months) timescales. To achieve predictability at long lead times and high spatial resolution, several analyses in the literature have constructed statistical models of fire activity using only antecedent climate predictors. However, in this talk, I will present preliminary seasonal forecasts of wildfire frequency and burned area for the western United States using SMLFire1.0, a stochastic machine learning (SML) fire model, that relies on both observed antecedent climate and vegetation predictors and seasonal forecasts of fire month climate. In particular, I will discuss results obtained by forcing the SMLFire1.0 model with seasonal forecasts from: a) downscaled and bias-corrected North American Multi-Model Ensemble (NMME) outputs, and b) skill-weighted climate analogs constructed using an autoregressive ML model. I will also comment upon the relative contribution of uncertainties, from climate forecasts and fire model simulations respectively, in projections of wildfire frequency and burned area across several spatial scales and lead times. 

How to cite: Buch, J., Williams, A. P., and Gentine, P.: Seasonal forecasts of wildfire frequency and burned area in the western United States using a stochastic machine learning fire model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11238, https://doi.org/10.5194/egusphere-egu23-11238, 2023.

EGU23-11355 | Posters on site | ITS1.14/CL5.8

Estimation of Fine Dust Concentration from BGR Images in Surveillance Cameras 

Hoyoung Cha, Jongyun Byun, Jongjin Baik, and Changhyun Jun

  This study proposes a novel approach on estimation of fine dust concentration from raw video data recorded by surveillance cameras. At first, several regions of interest are defined from specific images extracted from videos in surveillance cameras installed at Chung-Ang University. Among them, sky fields are mainly considered to figure out changes in characteristics of each color. After converting RGB images into BGR images, a number of discrete pixels with brightness intensities in a blue channel is mainly analyzed by investigating any relationships with fine dust concentration measured from automatic monitoring stations near the campus. Here, different values of thresholds from 125 to 200 are considered to find optimal conditions from changes in values of each pixel in the blue channel. This study uses the Pearson correlation coefficient to calculate the correlation between the number of pixels with values over the selected threshold and observed data for fine dust concentration. As an example on one specific date, the coefficients reflect their positive correlations with a range from 0.57 to 0.89 for each threshold. It should be noted that this study is a novel attempt to suggest a new, simple, and efficient method for estimating fine dust concentration from surveillance cameras common in many areas around the world.

 

Keywords: Fine Dust Concentration, BGR Image, Surveillance Camera, Threshold, Correlation Analysis

 

Acknowledgment

  This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2022R1A4A3032838) and this work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMI2022-01910 and this work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (2020R1G1A1013624).

How to cite: Cha, H., Byun, J., Baik, J., and Jun, C.: Estimation of Fine Dust Concentration from BGR Images in Surveillance Cameras, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11355, https://doi.org/10.5194/egusphere-egu23-11355, 2023.

EGU23-12137 | ECS | Posters on site | ITS1.14/CL5.8

Identifying mechanisms of low-level jets near coast of Kurzeme using Principal Component Analysis 

Maksims Pogumirskis, Tija Sīle, and Uldis Bethers

Low-level jets are maximums in the vertical profile of the wind speed profile in the lowest levels of atmosphere. Low-level jets, when present, can make a significant impact on the wind energy. Wind conditions in low-level jets depart from traditional assumptions about wind profile and low-level jets can also influence the stability and turbulence that are important for wind energy applications.

In literature commonly an algorithm of identifying low-level jets is used to estimate frequency of low-level jets. The algorithm searches for maximum in the lowest levels of the atmosphere with a temperature inversion above the jet maximum. The algorithm is useful in identifying the presence of the low-level jets and estimating their frequency. However, low-level jets can be caused by a number of different mechanisms which leads to differences in low-level jet characteristics. Therefore, additional analysis is necessary to distinguish between different types of jets and characterize their properties. We aim to automate this process using Principal Component Analysis (PCA) to identify main patterns of wind speed and temperature. By analyzing diurnal and seasonal cycles of these patterns a better understanding about climatology of low-level jets in the region can be gained.

This study focuses on the central part of the Baltic Sea. Several recent studies have identified the presence of low-level jets near the coast of Kurzeme. Typically, maximums of low-level jets are located several hundred meters above the surface, while near the coast of Kurzeme maximums of low-level jets are usually within the lowest 100 meters of the atmosphere.

Data from UERRA reanalysis with 11 km horizontal resolution on 12 height levels in the lowest 500 meters of atmosphere was used. The algorithm that identifies low-level jets was applied to the data, to estimate frequency of low-level jets in each grid cell of the model. Jet events were grouped by the wind direction to identify main trajectories of low-level jets in the region. Several atmosphere cross-sections that low-level jets frequently flow through were chosen for further analysis.

Model data was interpolated to the chosen cross-sections and PCA was applied to the cross-section data of wind speed, geostrophic wind speed and temperature. Main patterns of these meteorological parameters, such as wind speed maximum, temperature inversion above the surface of the sea and temperature difference between sea and land were identified by the PCA. Differences of principal components between cross-sections and diurnal and seasonal patterns of principal components helped to gain better understanding of climatology, extent and mechanisms of low-level jets in the region.

How to cite: Pogumirskis, M., Sīle, T., and Bethers, U.: Identifying mechanisms of low-level jets near coast of Kurzeme using Principal Component Analysis, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12137, https://doi.org/10.5194/egusphere-egu23-12137, 2023.

EGU23-12528 | ECS | Orals | ITS1.14/CL5.8

Evaluation of explainable AI solutions in climate science 

Philine Bommer, Marlene Kretschmer, Anna Hedstroem, Dilyara Bareeva, and Marina M.-C. Hoehne

Explainable artificial intelligence (XAI) methods serve as a support for researchers to shed light onto the reasons behind the predictions made by deep neural networks (DNNs). XAI methods have already been successfully applied to climate science, revealing underlying physical mechanisms inherent in the studied data. However, the evaluation and validation of XAI performance is challenging as explanation methods often lack ground truth. As the number of XAI methods is growing, a comprehensive evaluation is necessary to enable well-founded XAI application in climate science.

In this work we introduce explanation evaluation in the context of climate research. We apply XAI evaluation to compare multiple explanation methods for a multi-layer percepton (MLP) and a convolutional neural network (CNN). Both MLP and CNN assign temperature maps to classes based on their decade. We assess the respective explanation methods using evaluation metrics measuring robustness, faithfulness, randomization, complexity and localization. Based on the results of a random baseline test we establish an explanation evaluation guideline for the climate community. We use this guideline to rank the performance in each property of similar sets of explanation methods for the MLP and CNN. Independent of the network type, we find that Integrated Gradients, Layer-wise relevance propagation and InputGradients exhibit a higher robustness, faithfulness and complexity compared to purely Gradient-based methods, while sacrificing reactivity to network parameters, i.e. low randomisation scores. The contrary holds for Gradient, SmoothGrad, NoiseGrad and FusionGrad. Another key observation is that explanations using input perturbations, such as SmoothGrad and Integrated Gradients, do not improve robustness and faithfulness, in contrast to theoretical claims. Our experiments highlight that XAI evaluation can be applied to different network tasks and offers more detailed information about different properties of explanation method than previous research. We demonstrate that using XAI evaluation helps to tackle the challenge of choosing an explanation method.

How to cite: Bommer, P., Kretschmer, M., Hedstroem, A., Bareeva, D., and Hoehne, M. M.-C.: Evaluation of explainable AI solutions in climate science, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12528, https://doi.org/10.5194/egusphere-egu23-12528, 2023.

EGU23-12657 | Orals | ITS1.14/CL5.8 | Highlight

DeepExtremes: Explainable Earth Surface Forecasting Under Extreme Climate Conditions 

Karin Mora, Gunnar Brandt, Vitus Benson, Carsten Brockmann, Gustau Camps-Valls, Miguel-Ángel Fernández-Torres, Tonio Fincke, Norman Fomferra, Fabian Gans, Maria Gonzalez, Chaonan Ji, Guido Kraemer, Eva Sevillano Marco, David Montero, Markus Reichstein, Christian Requena-Mesa, Oscar José Pellicer Valero, Mélanie Weynants, Sebastian Wieneke, and Miguel D. Mahecha

Compound heat waves and drought events draw our particular attention as they become more frequent. Co-occurring extreme events often exacerbate impacts on ecosystems and can induce a cascade of detrimental consequences. However, the research to understand these events is still in its infancy. DeepExtremes is a project funded by the European Space Agency (https://rsc4earth.de/project/deepextremes/) aiming at using deep learning to gain insight into Earth surface under extreme climate conditions. Specifically, the goal is to forecast and explain extreme, multi-hazard, and compound events. To this end, the project leverages the existing Earth observation archive to help us better understand and represent different types of hazards and their effects on society and vegetation. The project implementation involves a multi-stage process consisting of 1) global event detection; 2) intelligent subsampling and creation of mini-data-cubes; 3) forecasting methods development, interpretation, and testing; and 4) cloud deployment and upscaling. The data products will be made available to the community following the reproducibility and FAIR data principles. By effectively combining Earth system science with explainable AI, the project contributes knowledge to advancing the sustainable management of consequences of extreme events. This presentation will show the progress made so far and specifically introduce how to participate in the challenges about spatio-temporal extreme event prediction in DeepExtremes.

How to cite: Mora, K., Brandt, G., Benson, V., Brockmann, C., Camps-Valls, G., Fernández-Torres, M.-Á., Fincke, T., Fomferra, N., Gans, F., Gonzalez, M., Ji, C., Kraemer, G., Marco, E. S., Montero, D., Reichstein, M., Requena-Mesa, C., Valero, O. J. P., Weynants, M., Wieneke, S., and Mahecha, M. D.: DeepExtremes: Explainable Earth Surface Forecasting Under Extreme Climate Conditions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12657, https://doi.org/10.5194/egusphere-egu23-12657, 2023.

EGU23-12889 | Orals | ITS1.14/CL5.8

New Berkeley Earth High Resolution Temperature Data Set 

Robert A. Rohde and Zeke Hausfather

Berkeley Earth is premiering a new high resolution analysis of historical instrumental temperatures.

This builds on our existing work on climate reconstruction by adding a simple machine learning layer to our analysis.  This new approach extracts weather patterns from model, satellite, and reanalysis data, and then layers these weather patterns on top of instrumental observations and our existing interpolation methods to produce new high resolution historical temperature fields.  This has quadrupled our output resolution from the previous 1° x 1° lat-long to a new global 0.25° x 0.25° lat-long resolution.  However, this is not simply a downscaling effort.  Firstly, the use of weather patterns derived from physical models and observations increases the spatial realism of the reconstructed fields.  Secondly, observations from regions with high density measurement networks have been directly incorporated into the high resolution field, allowing dense observations to be more fully utilized.  

This new data product uses significantly more observational weather station data and produces higher resolution historical temperature fields than any comparable product, allowing for unprecedented insights into historical local and regional climate change.  In particular, the effect of geographic features such as mountains, coastlines, and ecosystem variations are resolved with a level of detail that was not previously possible.  At the same time, previously established techniques for bias corrections, noise reduction, and error analysis continued to be utilized.  The resulting global field initially spans 1850 to present and will be updated on an ongoing basis.  This project does not significantly change the global understanding of climate change, but helps to provide local detail that was often unresolved previously.  The initial data product focuses on monthly temperatures, though a proposal exists to also create a high resolution daily temperature data set using similar methods.

This talk will describe the construction of the new data set and its characteristics.  The techniques used in this project are accessible enough that they are likely to be useful for other types of instrumental analyses wishing to improve resolution or leverage basic information about weather patterns derived from models or other sources.

How to cite: Rohde, R. A. and Hausfather, Z.: New Berkeley Earth High Resolution Temperature Data Set, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12889, https://doi.org/10.5194/egusphere-egu23-12889, 2023.

EGU23-12948 | ECS | Orals | ITS1.14/CL5.8

Identifying drivers of river floods using causal inference 

Peter Miersch, Shijie Jiang, Oldrich Rakovec, and Jakob Zscheischler

River floods are among the most devastating natural hazards, causing thousands of deaths and billions of euros in damages every year. Floods can result from a combination of compounding drivers such as heavy precipitation, snowmelt, and high antecedent soil moisture. These drivers and the processes they govern vary widely both between catchments and between flood events within a catchment, making a causal understanding of the underlying hydrological processes difficult.

Modern causal inference methods, such as the PCMCI framework, are able to identify drivers from complex time series through causal discovery and build causally aware statistical models. However, causal inference tailored to extreme events remains a challenge due to data length limitations. To overcome data limitations, here we bridge the gap between synthetic and real world data using 1,000 years of simulated weather to drive as state-of-the-art hydrological model (the mesoscale Hydrological Model, mHM) over a wide range of European catchments. From the simulated time series, we extract high runoff events, on which we evaluate the causal inference approach. We identify the minimum data necessary for obtaining robust causal models, evaluate metrics for model evaluation and comparison, and compare causal flood drivers across catchments. Ultimately, this work will help establish best practices in causal inference for flood research to identify meteorological and catchment specific flood drivers in a changing climate.

How to cite: Miersch, P., Jiang, S., Rakovec, O., and Zscheischler, J.: Identifying drivers of river floods using causal inference, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12948, https://doi.org/10.5194/egusphere-egu23-12948, 2023.

EGU23-13250 | ECS | Posters on site | ITS1.14/CL5.8

From MODIS cloud properties to cloud types using semi-supervised learning 

Julien Lenhardt, Johannes Quaas, and Dino Sejdinovic

Clouds are classified into types, classes, or regimes. The World Meteorological Organization distinguishes stratus and cumulus clouds and three altitude layers. Cloud types exhibit very different radiative properties and interact in numerous ways with aerosol particles in the atmosphere. However, it has proven difficult to define cloud regimes objectively and from remote sensing data, hindering the understanding we have of the processes and adjustments involved.

Building on the method we previously developed, we combine synoptic observations and passive satellite remote-sensing retrievals to constitute a database of cloud types and cloud properties to eventually train a cloud classification algorithm. The cloud type labels come from the global marine meteorological observations dataset (UK Met Office, 2006) which is comprised of near-global synoptic observations. This data record reports back information about cloud type and other meteorological quantities at the surface. The cloud classification model is built on different cloud-top and cloud optical properties (Level 2 products MOD06/MYD06 from the MODIS sensor) extracted temporally close to the observation time and on a 128km x 128km grid around the synoptic observation location. To make full use of the large quantity of remote sensing data available and to investigate the variety in cloud settings, a convolutional variational auto-encoder (VAE) is applied as a dimensionality reduction tool in a first step. Furthermore, such model architecture allows to account for spatial relationships while describing non-linear patterns in the input data. The cloud classification task is subsequently performed drawing on the constructed latent representation of the VAE. Associating information from underneath and above the cloud enables to build a robust model to classify cloud types. For the training we specify a study domain in the Atlantic ocean around the equator and evaluate the method globally. Further experiments and evaluation are done on simulation data produced by the ICON model.

How to cite: Lenhardt, J., Quaas, J., and Sejdinovic, D.: From MODIS cloud properties to cloud types using semi-supervised learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13250, https://doi.org/10.5194/egusphere-egu23-13250, 2023.

EGU23-13462 | ECS | Orals | ITS1.14/CL5.8

Double machine learning for geosciences 

Kai-Hendrik Cohrs, Gherardo Varando, Markus Reichstein, and Gustau Camps-Valls

Hybrid modeling describes the synergy between parametric models and machine learning [1]. Parts of a parametric equation are substituted by non-parametric machine learning models, which can then represent complex functions. These are inferred together with the parameters of the equation from the data. Hybrid modeling promises to describe complex relationships and to be scientifically interpretable. These promises, however, need to be taken with a grain of salt. With too flexible models, such as deep neural networks, the problem of equifinality arises: There is no identifiable optimal solution. Instead, many outcomes describe the data equally well, and we will obtain one of them by chance. Interpreting the result may lead to erroneous conclusions. Moreover, studies have shown that regularization techniques can introduce a bias on jointly estimated physical parameters [1].

We propose double machine learning (DML) to solve these problems [2]. DML is a theoretically well-founded technique for fitting semi-parametric models, i.e., models consisting of a parametric and a non-parametric component. DML is widely used for debiased treatment effect estimation in economics. We showcase its use for geosciences on two problems related to carbon dioxide fluxes: 

  • Flux partitioning, which aims at separating the net carbon flux (NEE) into its main contributing gross fluxes, namely, RECO and GPP.
  • Estimation of the temperature sensitivity parameter of ecosystem respiration Q10.

First, we show that in the case of synthetic data for Q10 estimation, we can consistently retrieve the true value of Q10 where the naive neural network approach fails. We further apply DML to the carbon flux partitioning problem and find that it is 1) able to retrieve the true fluxes of synthetic data, even in the presence of strong (and more realistic) heteroscedastic noise, 2) retrieves main gross carbon fluxes on real data consistent with established methods, and 3) allows us to causally interpret the retrieved GPP as the direct effect of the photosynthetically active radiation on NEE. This way, the DML approach can be seen as a causally interpretable, semi-parametric version of the established daytime methods. We also investigate the functional relationships inferred with DML and the drivers modulating the obtained light-use efficiency function. In conclusion, DML offers a solid framework to develop hybrid and semiparametric modeling and can be of widespread use in geosciences.

 

[1] Reichstein, Markus, et al. “Combining system modeling and machine learning into hybrid ecosystem modeling.” Knowledge-Guided Machine Learning (2022). https://doi.org/10.1201/9781003143376-14

[2] Chernozhukov, Victor, et al. “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal, Volume 21, Issue 1, 1 (2018): C1–C68. https://doi.org/10.1111/ectj.12097

How to cite: Cohrs, K.-H., Varando, G., Reichstein, M., and Camps-Valls, G.: Double machine learning for geosciences, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13462, https://doi.org/10.5194/egusphere-egu23-13462, 2023.

EGU23-13622 | ECS | Posters on site | ITS1.14/CL5.8

Towards explainable marine heatwaves forecasts 

Ayush Prasad and Swarnalee Mazumder

In recent years, both the intensity and extent of marine heatwaves have increased across the world. Anomalies in sea surface temperature have an effect on the health of marine ecosystems, which are crucial to the Earth's climate system. Marine Heatwaves' devastating impacts on aquatic life have been increasing steadily in recent years, harming aquatic ecosystems and causing a tremendous loss of marine life. Early warning systems and operational forecasting that can foresee such events can aid in designing effective and better mitigation techniques. Recent studies have shown that machine learning and deep learning-based approaches can be used for forecasting the occurrence of marine heatwaves up to a year in advance. However, these models are black box in nature and do not provide an understanding of the factors influencing MHWs. In this study, we used machine learning methods to forecast marine heatwaves. The developed models were tested across four historical Marine Heatwave events around the world. Explainable AI methods were then used to understand and analyze the relationships between the drivers of these events.

How to cite: Prasad, A. and Mazumder, S.: Towards explainable marine heatwaves forecasts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13622, https://doi.org/10.5194/egusphere-egu23-13622, 2023.

EGU23-14493 | ECS | Orals | ITS1.14/CL5.8

Interpretable probabilistic forecast of extreme heat waves 

Alessandro Lovo, Corentin Herbert, and Freddy Bouchet
Understanding and predicting extreme events is one of the major challenges for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Extreme heatwaves are, and likely will be in the future, among the deadliest weather events. They also increase strain on water resources, food security and energy supply. Developing the ability to forecast their probability of occurrence a few days, weeks, or even months in advance would have major consequences to reduce our vulnerability to these events. Beyond the practical benefits of forecasting heat waves, building statistical models for extreme events which are interpretable is also highly beneficial from a fundamental point of view. Indeed, they enable proper studies of the processes underlying extreme events such as heat waves, improve dataset or model validation, and contribute to attribution studies. Machine learning provides tools to reach both these goals.
We will first demonstrate that deep neural networks can predict the probability of occurrence of long-lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height field), and at much longer lead times for slow physical drivers (soil moisture). Those results are amazing in terms of forecasting skill. However, these machine learning models tend to be very complex and are often treated as black boxes. This limits our ability to use them for investigating the dynamics of extreme heat waves.
To gain physical understanding, we have then designed a network architecture which is intrinsically interpretable. The main idea of this architecture is that the network first computes an optimal index, which is an optimal projection of the physical fields in a low-dimensional space. In a second step, it uses a fully non-linear representation of the probability of occurrence of the event as a function of the optimal index. This optimal index can be visualized and compared with classical heuristic understanding of the physical process, for instance in terms of geopotential height and soil moisture. This fully interpretable network is slightly less efficient than the off-the-shelf deep neural network. We fully quantify the performance loss incurred when requiring interpretability and make the connection with the mathematical notion of committor functions.
This new machine learning tool opens the way for understanding optimal predictors of weather and climate extremes. This has potential for the study of slow drivers, and the effect of climate change on the drivers of extreme events.

How to cite: Lovo, A., Herbert, C., and Bouchet, F.: Interpretable probabilistic forecast of extreme heat waves, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14493, https://doi.org/10.5194/egusphere-egu23-14493, 2023.

EGU23-14856 | ECS | Orals | ITS1.14/CL5.8

Classification of Indoor Air Pollution Using Low-cost Sensors by Machine Learning 

Andrii Antonenko, Viacheslav Boretskij, and Oleksandr Zagaria

Air pollution has become an integral part of modern life. The main source of air pollution can be considered combustion processes associated with energy-intensive corporate activities. Energy companies consume about one-third of the fuel produced and are a significant source of air pollution [1]. State and public air quality monitoring networks were created to monitor the situation. Public monitoring networks are cheaper and have more coverage than government ones. Although the state monitoring system shows more accurate data, an inexpensive network is sufficient to inform the public about the presence or absence of pollution (air quality). In order to inform the public, the idea arose to test the possibility of detecting types of pollution using data from cheap air quality monitoring sensors. In general, to use a cheap sensor for measurements, it must first be calibrated (corrected) by comparing its readings with a reference device. Various mathematical methods can be used for this. One of such method is neural network training, which has proven itself well for correcting PM particle readings due to relative humidity impact [2].

The idea of using a neural network to improve data quality is not new, but it is quite promising, as the authors showed in [3]. The main problem to implement this method is connected with a reliable dataset for training the network. For this, it is necessary to register sensor readings for relatively clean air and for artificially generated or known sources of pollution. Training the neural network on the basis of collected data can be used to determine (classify) types of air: with pollution (pollutant) or without. For this, an experiment was set up in the "ReLab" co-working space at the Taras Shevchenko National University of Kyiv. The sensors were placed in a closed box, in which airflow ventilation is provided. The ZPHS01B [4] sensor module was used for inbox measurements, as well as, calibrated sensors PMS7003 [5] and BME280 [6]. Additionally, IPS 7100 [7] and SPS30 [8] were added to enrich the database for ML training. A platform based on HiLink 7688 was used for data collecting, processing, and transmission.

Data was measured every two seconds, independently from each sensor. Before each experiment, the room was ventilated to avoid influence on the next series of experiments.

References

1. Zaporozhets A. Analysis of means for monitoring air pollution in the environment. Science-based technologies. 2017, Vol. 35, no3. 242-252. DOI: 10.18372/2310-5461.35.11844

2. Antonenko A, (2021) Correction of fine particle concentration readings depending on relative humidity, [Master's thesis, Taras Shevchenko National University of Kyiv], 35 pp.

3. Lee, J. Kang, S. Kim, Y. Im, S. Yoo , D. Lee, “Long-Term Evaluation and Calibration of Low-Cost Particulate Matter (PM) Sensor”, Sensors 2020, vol. 20, 3617, 24 pp., 2020.`

4. ZPHS01B Datasheet URL: https://pdf1.alldatasheet.com/datasheet-pdf/view/1303697/WINSEN/ZPHS01B.html

5. Plantower PMS7003 Datasheet URL: https://www.espruino.com/datasheets/PMS7003.pdf

6. Bosch 280 Datasheet URL: https://www.mouser.com/datasheet/2/783/BST-BME280-DS002-1509607.pdf

7. https://pierasystems.com/intelligent-particle-sensors/

8. https://sensirion.com/products/catalog/SPS30/

How to cite: Antonenko, A., Boretskij, V., and Zagaria, O.: Classification of Indoor Air Pollution Using Low-cost Sensors by Machine Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14856, https://doi.org/10.5194/egusphere-egu23-14856, 2023.

EGU23-15000 | ECS | Orals | ITS1.14/CL5.8 | Highlight

Causal inference to study food insecurity in Africa 

Jordi Cerdà-Bautista, José María Tárraga, Gherardo Varando, Alberto Arribas, Ted Shepherd, and Gustau Camps-Valls

The current situation regarding food insecurity in the continent of Africa, and the Horn of Africa in particular, is at an unprecedented risk level triggered by continuous drought events, complicated interactions between food prices, crop yield, energy inflation and lack of humanitarian aid, along with disrupting conflicts and migration flows. The study of a food-secure environment is a complex, multivariate, multiscale, and non-linear problem difficult to understand with canonical data science methodologies. We propose an alternative approach to the food insecurity problem from a causal inference standpoint to discover the causal relations and evaluate the likelihood and potential consequences of specific interventions. In particular, we demonstrate the use of causal inference for understanding the impact of humanitarian interventions on food insecurity in Somalia. In the first stage of the problem, we apply different data transformations to the main drivers to achieve the highest degree of correlation with the interested variable. In the second stage, we infer causation from the main drivers and interested variables by applying different causal methods such as PCMCI or Granger causality. We analyze and harmonize different time series, per district of Somalia, of the global acute malnutrition (GAM) index, food market prices, crop production, conflict levels, drought and flood internal displacements, as well as climate indicators such as the NDVI index, precipitation or land surface temperature. Then, assuming a causal graph between the main drivers causing the food insecurity problem, we estimate the effect of increasing humanitarian interventions on the GAM index, considering the effects of a changing climate, migration flows, and conflict events. We show that causal estimation with modern methodologies allows us to quantify the impact of humanitarian aid on food insecurity.

 

References

 

[1] Runge, J., Bathiany, S., Bollt, E. et al. Inferring causation from time series in Earth system sciences. Nat Commun 10, 2553 (2019). https://doi.org/10.1038/s41467-019-10105-3

[2] Sazib Nazmus, Mladenova lliana E., Bolten John D., Assessing the Impact of ENSO on Agriculture Over Africa Using Earth Observation Data, Frontiers in Sustainable Food Systems, 2020, 10.3389/fsufs.2020.509914. https://www.frontiersin.org/article/10.3389/fsufs.2020.509914

[3] Checchi, F., Frison, S., Warsame, A. et al. Can we predict the burden of acute malnutrition in crisis-affected countries? Findings from Somalia and South Sudan. BMC Nutr 8, 92 (2022). https://doi.org/10.1186/s40795-022-00563-2

How to cite: Cerdà-Bautista, J., Tárraga, J. M., Varando, G., Arribas, A., Shepherd, T., and Camps-Valls, G.: Causal inference to study food insecurity in Africa, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15000, https://doi.org/10.5194/egusphere-egu23-15000, 2023.

EGU23-15185 | ECS | Posters on site | ITS1.14/CL5.8

Deep learning to support ocean data quality control 

Mohamed Chouai, Felix Simon Reimers, and Sebastian Mieruch-Schnülle

In this study, which is part of the M-VRE [https://mosaic-vre.org/about] project, we aim to improve a quality control (QC) system on arctic ocean temperature profile data using deep learning. For the training, validation, and evaluation of our algorithms, we are using the UDASH dataset [https://essd.copernicus.org/articles/10/1119/2018/]. In the classical QC setting, the ocean expert or "operator", applies a series of thresholding (classical) algorithms to identify, i.e. flag, erroneous data. In the next step, the operator visually inspects every data profile, where suspicious samples have been identified. The goal of this time-consuming visual QC is to find "false positives", i.e. flagged data that is actually good, because every sample/profile has not only a scientific value but also a monetary one. Finally, the operator turns all "false positive" data back to good. The crucial point here is that although these samples/profiles are above certain thresholds they are considered good by the ocean expert. These human expert decisions are extremely difficult, if not impossible, to map by classical algorithms. However, deep-learning neural networks have the potential to learn complex human behavior. Therefore, we have trained a deep learning system to "learn" exactly the expert behavior of finding "false positives" (identified by the classic thresholds), which can be turned back to good accordingly. The first results are promising. In a fully automated setting, deep learning improves the results and fewer data are flagged. In a subsequent visual QC setting, deep learning relieves the expert with a distinct workload reduction and gives the option to clearly increase the quality of the data.
Our long-term goal is to develop an arctic quality control system as a series of web services and Jupyter notebooks to apply automated and visual QC online, efficient, consistent, reproducible, and interactively.

How to cite: Chouai, M., Simon Reimers, F., and Mieruch-Schnülle, S.: Deep learning to support ocean data quality control, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15185, https://doi.org/10.5194/egusphere-egu23-15185, 2023.

EGU23-15286 | ECS | Orals | ITS1.14/CL5.8

Spatio-temporal downscaling of precipitation data using a conditional generative adversarial network 

Luca Glawion, Julius Polz, Benjamin Fersch, Harald Kunstmann, and Christian Chwala

Natural disasters caused by cyclones, hail, landslides or floods are directly related to precipitation. Global climate models are an important tool to adapt to these hazards in a future climate. However, they operate on spatial and temporal discretizations that limit the ability to adequately reflect these fast evolving, highly localized phenomena which has led to the development of various downscaling approaches .

Conditional generative adversarial networks (cGAN) have recently been applied as a promising downscaling technique to improve the spatial resolution of climate data. The ability of GANs to generate ensembles of solutions from random perturbations can be used to account for the stochasticity of climate data and quantify uncertainties. 

We present a cGAN for not only downscaling the spatial, but simultaneously also the temporal dimension of precipitation data as a so-called video super resolution approach. 3D convolutional layers are exploited for extracting and generating temporally consistent  rain events with realistic fine-scale structure. We downscale coarsened gauge adjusted and climatology corrected precipitation data from Germany from a spatial resolution of 32 km to 2 km and a temporal resolution of 1 hr to 10 min, by applying a novel training routine using partly normalized and logarithmized data, allowing for improved extreme value statistics of the generated fields.

Exploiting the fully convolutional nature of our model we can generate downscaled maps for the whole of Germany in a single downscaling step at low latency. The evaluation of these maps using a spatial and temporal power spectrum analysis shows that the generated temporal and spatial structures are in high agreement with the reference. Visually, the generated temporally evolving and advecting rain events are hardly classifiable as artificial generated. The model also shows high skill regarding pixel-wise error and localization of high precipitation intensities, considering the FSS, CRPS, KS and RMSE. Due to the underdetermined downscaling problem a probabilistic cGAN approach yields additional information to deterministic models which we use for comparison. The method is also capable of preserving the climatology, e.g., expressed as the annual precipitation sum. Investigations of temporal aggregations of the downscaled fields revealed an interesting effect. We observe that structures generated in networks with convolutional layers are not placed completely at random, but can generate recurrent structures, which can also be discovered within other prominent DL downscaling models. Although they can be mitigated by adequate model selection, their occurrence remains an open research question.

We conclude that our proposed approach can extend the application of cGANs for downscaling to the time dimension and therefore is a promising candidate to supplement conventional downscaling methods due to the high performance and computational efficiency.

How to cite: Glawion, L., Polz, J., Fersch, B., Kunstmann, H., and Chwala, C.: Spatio-temporal downscaling of precipitation data using a conditional generative adversarial network, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15286, https://doi.org/10.5194/egusphere-egu23-15286, 2023.

EGU23-15540 | ECS | Posters on site | ITS1.14/CL5.8 | Highlight

USCC: A Benchmark Dataset for Crop Yield Prediction under Climate Extremes 

Adrian Höhl, Stella Ofori-Ampofo, Ivica Obadic, Miguel-Ángel Fernández-Torres, Ridvan Salih Kuzu, and Xiaoxiang Zhu

Climate variability and extremes are known to represent major causes for crop yield anomalies. They can lead to the reduction of crop productivity, which results in disruptions in food availability and nutritional quality, as well as in rising food prices. Extreme climates will become even more severe as global warming proceeds, challenging the achievement of food security. These extreme events, especially droughts and heat waves, are already evident in major food-production regions like the United States. Crops cultivated in this country such as corn and soybean are critical for both domestic use and international supply. Considering the sensitivity of crops to climate, here we present a dataset that couples remote sensing surface reflectances with climate variables (e.g. minimum and maximum temperature, precipitation, and vapor pressure) and extreme indicators. The dataset contains the crop yields of various commodities over the USA for nearly two decades. Given the advances and proven success of machine learning in numerous remote sensing tasks, our dataset constitutes a benchmark to advance the development of novel models for crop yield prediction, and to analyze the relationship between climate and crop yields for gaining scientific insights. Other potential use cases include extreme event detection and climate forecasting from satellite imagery. As a starting point, we evaluate the performance of several state-of-the-art machine and deep learning models to form a baseline for our benchmark dataset.

How to cite: Höhl, A., Ofori-Ampofo, S., Obadic, I., Fernández-Torres, M.-Á., Salih Kuzu, R., and Zhu, X.: USCC: A Benchmark Dataset for Crop Yield Prediction under Climate Extremes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15540, https://doi.org/10.5194/egusphere-egu23-15540, 2023.

EGU23-15817 | ECS | Posters on site | ITS1.14/CL5.8

Evaluating the generalization ability of a deep learning model trained to detect cloud-to-ground lightning on raw ERA5 data 

Gregor Ehrensperger, Tobias Hell, Georg Johann Mayr, and Thorsten Simon

Atmospheric conditions that are typical for lightning are commonly represented by proxies such as cloud top height, cloud ice flux, CAPE times precipitation, or the lightning potential index. While these proxies generally deliver reasonable results, they often need to be adapted for local conditions in order to perform well. This suggests that there is a need for more complex and holistic proxies. Recent research confirms that the use of machine learning (ML) approaches for describing lightning is promising.

In a previous study a deep learning model was trained on single spatiotemporal (30km x 30km x 1h) cells in the summer period of the years 2010--2018 and showed good results for the unseen test year 2019 within Austria. We now improve this model by using multiple neighboring vertical atmospheric columns to also address for horizontal moisture advection. Furthermore data of successive hours is used as input data to enable the model to capture the temporal development of atmospheric conditions such as the build-up and breakdown of convections.

In this work we focus on the summer months June to August and use data from parts of Central Europe. This spatial domain is thought to be representative for Continental Europe since it covers mountainous aswell as coastal regions. We take raw ERA5 parameters beyond the tropopause enriched with a small amount of meta data such as the day of the year and the hour of the day for training. The quality of the resulting paramaterized model is then evaluated on Continental Europe to examine the generalization ability.

Using parts of Central Europe to train the model, we evaluate its ability to generalize on unseen parts of Continental Europe using EUCLID data. Having a model that generalizes well is a building block for a retrospective analysis back into years where the structured recording of accurate lightning observations in a unified way was not established yet.

How to cite: Ehrensperger, G., Hell, T., Mayr, G. J., and Simon, T.: Evaluating the generalization ability of a deep learning model trained to detect cloud-to-ground lightning on raw ERA5 data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15817, https://doi.org/10.5194/egusphere-egu23-15817, 2023.

EGU23-16098 | Posters on site | ITS1.14/CL5.8

Identifying Lightning Processes in ERA5 Soundings with Deep Learning 

Tobias Hell, Gregor Ehrensperger, Georg J. Mayr, and Thorsten Simon

Atmospheric environments favorable for lightning and convection are commonly represented by proxies or parameterizations based on expert knowledge such as CAPE, wind shears, charge separation, or combinations thereof. Recent developments in the field of machine learning, high resolution reanalyses, and accurate lightning observations open possibilities for identifying tailored proxies without prior expert knowledge. To identify vertical profiles favorable for lightning, a deep neural network links ERA5 vertical profiles of cloud physics, mass field variables and wind to lightning location data from the Austrian Lightning Detection & Information System (ALDIS), which has been transformed to a binary target variable labelling the ERA5 cells as lightning and no lightning cells. The ERA5 parameters are taken on model levels beyond the tropopause forming an input layer of approx. 670 features. The data of 2010 - 2018 serve as training/validation. On independent test data, 2019, the deep network outperforms a reference with features based on meteorological expertise. Shapley values highlight the atmospheric processes learned by the network which identifies cloud ice and snow content in the upper and mid-troposphere as relevant features. As these patterns correspond to the separation of charge in thunderstorm cloud, the deep learning model can serve as physically meaningful description of lightning. 

How to cite: Hell, T., Ehrensperger, G., Mayr, G. J., and Simon, T.: Identifying Lightning Processes in ERA5 Soundings with Deep Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16098, https://doi.org/10.5194/egusphere-egu23-16098, 2023.

EGU23-16163 | ECS | Posters on site | ITS1.14/CL5.8

A comparison of methods for determining the number of classes in unsupervised classification of climate models 

Emma Boland, Dani Jones, and Erin Atkinson

Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, the user must choose the number of classes to fit in advance. Typically, a combination of statistical methods and expertise is used to choose the appropriate number of classes for a given study, however it may not be possible to identify a single ‘optimal’ number of classes. In this
work we present a heuristic method for determining the number of classes unambiguously for modelled data where more than one ensemble member is available. This method requires robustness in the class definition between simulated ensembles of the system of interest. For demonstration, we apply this to the clustering of Southern Ocean potential temperatures in a CMIP6 climate model, and compare with other common criteria such as Bayesian Information Criterion (BIC) and the Silhouette Score.

How to cite: Boland, E., Jones, D., and Atkinson, E.: A comparison of methods for determining the number of classes in unsupervised classification of climate models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16163, https://doi.org/10.5194/egusphere-egu23-16163, 2023.

EGU23-16186 | ECS | Posters on site | ITS1.14/CL5.8

A review of deep learning for weather prediction 

Jannik Thümmel, Martin Butz, and Bedartha Goswami

Recent years have seen substantial performance-improvements of deep-learning-based
weather prediction models (DLWPs). These models cover a large range of temporal and
spatial resolutions—from nowcasting to seasonal forecasting and on scales ranging from
single to hundreds of kilometers. DLWPs also exhibit a wide variety of neural architec-
tures and training schemes, with no clear consensus on best practices. Focusing on the
short-to-mid-term forecasting ranges, we review several recent, best-performing models
with respect to critical design choices. We emphasize the importance of self-organizing
latent representations and inductive biases in DLWPs: While NWPs are designed to sim-
ulate resolvable physical processes and integrate unresolvable subgrid-scale processes by
approximate parameterizations, DLWPs allow the latent representation of both kinds of
dynamics. The purpose of this review is to facilitate targeted research developments and
understanding of how design choices influence performance of DLWPs. While there is
no single best model, we highlight promising avenues towards accurate spatio-temporal
modeling, probabilistic forecasts and computationally efficient training and infer

How to cite: Thümmel, J., Butz, M., and Goswami, B.: A review of deep learning for weather prediction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16186, https://doi.org/10.5194/egusphere-egu23-16186, 2023.

EGU23-16443 | ECS | Orals | ITS1.14/CL5.8

Hybrid machine learning model of coupled carbon and water cycles 

Zavud Baghirov, Basil Kraft, Martin Jung, Marco Körner, and Markus Reichstein

There is evidence for a strong coupling between the terrestrial carbon and water cycles and that these cycles should be studied as an interconnected system (Humphrey et al. 2018). One of the key methods to numerically represent the Earth system is process based modelling, which is, however, still subject to large uncertainties, e.g., due to wrong or incomplete process knowledge (Bonan and Doney 2018). Such models are often rigid and only marginally informed by Earth observations. This is where machine learning (ML) approaches can be advantageous, due to their ability to learn from data in a flexible way. These methods have their own shortcomings, such as their “black-box” nature and lack of physical consistency.

Recently, it has been suggested by Reichstein et al. (2019) to combine process knowledge with ML algorithms to model environmental processes. The so-called hybrid modelling approach has already been used to model different components of terrestrial water storage (TWS) in a global hydrological model (Kraft et al. 2022). This study follows-up on this work with the objective to improve the parameterization of some processes (e.g., soil moisture) and to couple the model with the carbon cycle. The coupling could potentially reduce model uncertainties and help to better understand water-carbon interactions.

The proposed hybrid model of the coupled water and carbon cycles is forced with reanalysis data from ERA-5, such as air temperature, net radiation, and CO2 concentration from CAMS. Water-carbon cycle processes are constrained using observational data products of water-carbon cycles. The hybrid model uses a long short-term memory (LSTM) model—a member of the recurrent neural networks family—at its core for processing the time-series Earth observation data. The LSTM simulates a number of coefficients which are used as parameters in the conceptual model of water and carbon cycles. Some of the key processes represented in the conceptual model are evapotranspiration, snow, soil moisture, runoff, groundwater, water use efficiency (WUE), ecosystem respiration, and net ecosystem exchange. The model partitions TWS into different components and it can be used to assess the impact of different TWS components on the CO2 growth rate. Moreover, we can assess the learned system behaviors of water and carbon cycle interactions for different ecosystems.

References:

Bonan, Gordon B, and Scott C Doney. 2018. “Climate, Ecosystems, and Planetary Futures: The Challenge to Predict Life in Earth System Models.” Science 359 (6375): eaam8328.

Humphrey, Vincent, Jakob Zscheischler, Philippe Ciais, Lukas Gudmundsson, Stephen Sitch, and Sonia I Seneviratne. 2018. “Sensitivity of Atmospheric CO2 Growth Rate to Observed Changes in Terrestrial Water Storage.” Nature 560 (7720): 628–31.

Kraft, Basil, Martin Jung, Marco Körner, Sujan Koirala, and Markus Reichstein. 2022. “Towards Hybrid Modeling of the Global Hydrological Cycle.” Hydrology and Earth System Sciences 26 (6): 1579–1614.

Reichstein, Markus, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais, et al. 2019. “Deep Learning and Process Understanding for Data-Driven Earth System Science.” Nature 566 (7743): 195–204.

How to cite: Baghirov, Z., Kraft, B., Jung, M., Körner, M., and Reichstein, M.: Hybrid machine learning model of coupled carbon and water cycles, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16443, https://doi.org/10.5194/egusphere-egu23-16443, 2023.

EGU23-16449 | Orals | ITS1.14/CL5.8

Data-driven seasonal forecasts of European heat waves 

Stefano Materia, Martin Jung, Markus G. Donat, and Carlos Gomez-Gonzalez

Seasonal Forecasts are critical tools for early-warning decision support systems, that can help reduce the related risk associated with hot or cold weather and other events that can strongly affect a multitude of socio-economic sectors. Recent advances in both statistical approaches and numerical modeling have improved the skill of Seasonal Forecasts. However, especially in mid-latitudes, they are still affected by large uncertainties that can limit their usefulness.

The MSCA-H2020 project ARTIST aims at improving our knowledge of climate predictability at the seasonal time-scale, focusing on the role of unexplored drivers, to finally enhance the performance of current prediction systems. This effort is meant to reduce uncertainties and make forecasts efficiently usable by regional meteorological services and private bodies. This study focuses on seasonal prediction of heat extremes in Europe, and here we present a first attempt to predict heat wave accumulated activity across different target seasons. An empirical seasonal forecast is designed based on Machine Learning techniques. A feature selection approach is used to detect the best subset of predictors among a variety of candidates, and then an assessment of the relative importance of each predictor is done, in different European regions for the four main seasons.

Results show that many observed teleconnections are caught by the data-driven approach, while a few features that show to be linked to the heat wave propensity of a season deserve a deeper understanding of the underpinning physical process.

How to cite: Materia, S., Jung, M., Donat, M. G., and Gomez-Gonzalez, C.: Data-driven seasonal forecasts of European heat waves, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16449, https://doi.org/10.5194/egusphere-egu23-16449, 2023.

EGU23-16846 | ECS | Orals | ITS1.14/CL5.8

Learning causal drivers of PyroCb 

Emiliano Díaz, Gherardo Varando, Fernando Iglesias-Suarez, Gustau Camps-Valls, Kenza Tazi, Kara Lamb, and Duncan Watson-Parris

Discovering causal relationships from purely observational data is often not possible. In this case, combining observational and experimental data can allow for the identifiability of the underlying causal structure. In Earth Systems sciences, carrying out interventional experiments is often impossible for ethical and practical reasons. However, “natural interventions”, are often present in the data, and these represent regime changes caused by changes to exogenous drivers. In [3,4], the Invariant Causal Prediction (ICP) methodology was presented to identify the causes of a target variable of interest from a set of candidate causes. This methodology takes advantage of natural interventions, resulting in different cause variables distributions across different environments.  In [2] this methodology is implemented in a geoscience problem, namely identifying the causes of Pyrocumulunimbus (pyroCb), and storm clouds resulting from extreme wildfires. Although a set of plausible causes is produced, certain heuristic adaptations to the original ICP methodology were implemented to overcome some of the practical. limitations of ICP: a large number of hypothesis tests required and a failure to identify causes when these have a high degree of interdependence. In this work, we try to circumvent these difficulties by taking a different approach. We use a learning paradigm similar to that presented in [3] to learn causal representations invariant across different environments. Since we often don’t know exactly how to define the different environments best, we also propose to learn functions that describe their spatiotemporal extent. We apply the resulting algorithm to the pyroCb database in [1] and other Earth System sciences datasets to verify the plausibility of the causal representations found and the environments that describe the so-called natural interventions.. 

 

[1] Tazi et al. 2022. https://arxiv.org/abs/2211.13052

[2] Díaz et al. 2022 .https://arxiv.org/abs/2211.08883

[3] Arjovsky et al. 2019. https://arxiv.org/abs/1907.02893

[4] Peters et al.2016.  https://www.jstor.org/stable/4482904

[5] Heinze-Deml et al. 2018. https://doi.org/10.1515/jci-2017-0016

How to cite: Díaz, E., Varando, G., Iglesias-Suarez, F., Camps-Valls, G., Tazi, K., Lamb, K., and Watson-Parris, D.: Learning causal drivers of PyroCb, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16846, https://doi.org/10.5194/egusphere-egu23-16846, 2023.

EGU23-17082 | ECS | Posters on site | ITS1.14/CL5.8

A statistical approach on rapid estimations of climate change indices by monthly instead of daily data 

Kristofer Hasel, Marianne Bügelmayer-Blaschek, and Herbert Formayer

Climate change indices (CCI) defined by the expert team on climate change detection and indices (ETCCDI) profoundly contribute to understanding climate and its change. They are used to present climate change in an easy to understand and tangible way, thus facilitating climate communication. Many of the indices are peak over threshold indices needing daily and, if necessary, bias corrected data to be calculated from. We present a method to rapidly estimate specific CCI from monthly data instead of daily while also performing a simple bias correction as well as a localisation (downscaling). Therefore, we used the ERA5 Land data with a spatial resolution of 0.1° supplemented by a CMIP6 ssp5-8.5 climate projection to derive different regression functions which allow a rapid estimation by monthly data. Using a climate projection as a supplement in training the regression functions allows an application not only on historical periods but also on future periods such as those provided by climate projections. Nevertheless, the presented method can be adapted to any data set, allowing an even higher spatial resolution.

How to cite: Hasel, K., Bügelmayer-Blaschek, M., and Formayer, H.: A statistical approach on rapid estimations of climate change indices by monthly instead of daily data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17082, https://doi.org/10.5194/egusphere-egu23-17082, 2023.

EGU23-17197 | Posters on site | ITS1.14/CL5.8

Machine learning workflow for deriving regional geoclimatic clusters from high-dimensional data 

Sebastian Lehner, Katharina Enigl, and Matthias Schlögl

Geoclimatic regions represent climatic forcing zones, which constitute important spatial entities that serve as a basis for a broad range of analyses in earth system sciences. The plethora of geospatial variables that are relevant for obtaining consistent clusters represent a high-dimensionality, especially when working with high-resolution gridded data, which may render the derivation of such regions complex. This is worsened by typical characteristics of geoclimatic data like multicollinearity, nonlinear effects and potentially complex interactions between features. We therefore present a nonparametric machine learning workflow, consisting of dimensionality reduction and clustering for deriving geospatial clusters of similar geoclimatic characteristics. We demonstrate the applicability of the proposed procedure using a comprehensive dataset featuring climatological and geomorphometric data from Austria, aggregated to the recent climatological normal from 1992 to 2021.
 
The modelling workflow consists of three major sequential steps: (1) linear dimensionality reduction using Principal Component Analysis, yielding a reduced, orthogonal sub-space, (2) nonlinear dimensionality reduction applied to the reduced sub-space using Uniform Manifold Approximation and Projection, and (3) clustering the learned manifold projection via Hierarchical Density-Based Spatial Clustering of Applications with Noise. The contribution of the input features to the cluster result is then assessed by means of permutation feature importance of random forest models. These are trained by treating the clustering result as a supervised classification problem. Results show the flexibility of the defined workflow and exhibit good agreement with both quantitatively derived and synoptically informed characterizations of geoclimatic regions from other studies. However, this flexibility does entail certain challenges with respect to hyperparameter settings, which require careful exploration and tuning. The proposed workflow may serve as a blueprint for deriving consistent geospatial clusters exhibiting similar geoclimatic attributes.

How to cite: Lehner, S., Enigl, K., and Schlögl, M.: Machine learning workflow for deriving regional geoclimatic clusters from high-dimensional data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17197, https://doi.org/10.5194/egusphere-egu23-17197, 2023.

EGU23-17333 | ECS | Posters on site | ITS1.14/CL5.8

Emulating the regional temperature responses (RTPs) of short-lived climate forcers 

Maura Dewey, Hans Christen Hansson, and Annica M. L. Ekman

Here we develop a statistical model emulating the surface temperature response to changes in emissions of short-lived climate forcers as simulated by an Earth system model. Short-lived climate forcers (SLCFs) are chemical components in the atmosphere that interact with radiation and have both an immediate effect on local air quality, and regional and global effects on the climate in terms of changes in temperature and precipitation distributions. The short atmospheric residence times of SLCFs lead to high atmospheric concentrations in emission regions and a highly variable radiative forcing pattern. Regional Temperature Potentials (RTPs) are metrics which quantify the impact of emission changes in a given region on the temperature or forcing response of another, accounting for spatial inhomogeneities in both forcing and the temperature response, while being easy to compare across models and to use in integrated assessment studies or policy briefs. We have developed a Gaussian-process emulator using output from the Norwegian Earth System Model (NorESM) to predict the temperature responses to regional emission changes in SLCFs (specifically back carbon, organic carbon, sulfur dioxide, and methane) and use this model to calculate regional RTPs and study the sensitivity of surface temperature in a certain region, e.g. the Arctic, to anthropogenic emission changes in key policy regions. The main challenge in developing the emulator was creating the training data set such that we included maximal SLCF variability in a realistic and policy relevant range compared to future emission scenarios, while also getting a significant temperature response. We also had to account for the confounding influence of greenhouse gases (GHG), which may not follow the same future emission trajectories as SLCFs and can overwhelm the more subtle temperature response that comes from the direct and indirect effects of SLCF emissions. The emulator can potentially provide accurate and customizable predictions for policy makers to proposed emission changes with minimized climate impact.

How to cite: Dewey, M., Hansson, H. C., and Ekman, A. M. L.: Emulating the regional temperature responses (RTPs) of short-lived climate forcers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17333, https://doi.org/10.5194/egusphere-egu23-17333, 2023.

EGU23-542 | ECS | Posters on site | ESSI1.3

Electron Temperature Inference from Fixed Bias Langmuir Probes Set-Ups in Ionospheric Conditions 

Florine Enengl, Sigvald Marholm, Sayan Adhikari, Richard Marchand, and Wojciech J. Miloch

In this work, we show the first achievement of inferring the electron temperature in ionospheric conditions from synthetic data using fixed-bias Langmuir probes operating in the electron saturation region. This was done using machine learning, as well as by altering the probe geometry. The electron temperature is inferred at the same rate as the currents are sampled by the probes. For inferring the electron temperature along with the electron density and the floating potential, a minimum number of three probes is required. Furthermore does one probe geometry need to be distinct from the other two, since otherwise the probe setup may be insensitive to temperature. This can be achieved by having either one shorter probe or a probe of a different geometry, e.g. two longer and a shorter cylindrical probe or two cylindrical probes and a spherical probe. We use synthetic plasma parameter data and calculate the synthetic collected probe currents to train a neural network (using TensorFlow) and verify the results with a test set as well as with data from the International Reference Ionosphere (IRI) model. A table with computed currents collected by a spherical probe by Laframboise was extended to calculate currents of the synthetic plasma parameters for high eta values (eta >25) to cover a large altitude range (100-500 km, within Earth's ionosphere). The extrapolated values were benchmarked with Particle-in-Cell simulations. Finally, we evaluate the robustness and errors of different probe setups that can be used to infer the electron temperature. As the inferred temperatures are compared to results from the International Reference Ionosphere model, we verify the validity of the inferred temperature in altitudes ranging from about 100-500 km. We show that electron temperature inference from different combinations of spherical and cylindrical probes - three cylindrical probes, three spherical probes, four cylindrical and a spherical probe - can be achieved. Even minor changes in the probe sizing enable the temperature inference and result in root mean square relative errors (RMSRE) between inferred and ground truth data of under 3%. With further optimizations, the RMSRE can even be decreased to under 1%. When limiting the temperature inference to 120-450 km altitude an RMSRE of under 0.7% is achieved for all probe setups. In future, the multi-needle Langmuir Probe (m-NLP) instrument dimensions can be adapted for higher temperature inference accuracy.

How to cite: Enengl, F., Marholm, S., Adhikari, S., Marchand, R., and Miloch, W. J.: Electron Temperature Inference from Fixed Bias Langmuir Probes Set-Ups in Ionospheric Conditions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-542, https://doi.org/10.5194/egusphere-egu23-542, 2023.

EGU23-850 | Posters on site | ESSI1.3

Unsupervised learning of active-region nesting on the Sun 

Emre Isik, Nurdan Karapinar, and Selim Göktug Cankurtaran

Active-region emergence on the Sun shows a degree of clumpiness in both space and time. At a given time, multiple active regions can be seen in what is called active-region- or sunspot-group nests. This tendency also increases the potential to produce large flares and associated CMEs. In the literature, the nesting tendency of active regions is reported in the range of 30-50 per cent, but no statistically robust and ML-based approaches exist so far. Quantifying the nesting degree along an activity cycle and determining its spatial and temporal scales are important to investigate the processes that cause this phenomenon. 

In this study, we estimate the latitudinal and longitudinal extents of active region nesting using both continuum and magnetogram data, using SDO/HMI synoptic magnetograms and Kislovodsk Mountain Astronomical Station (KMAS) sunspot group data. We carry out kernel density estimation (Fig. 1) and unsupervised ML techniques (e.g., DBSCAN and Gaussian mixtures) in spatial and spatio-temporal domains. Our study reveals trends in the emergence characteristics of sunspot groups on the Sun.


Figure 1: Kernel density estimation with a Gaussian kernel on the time-longitude plane. The dot size indicates sunspot group areas in MSH. 

How to cite: Isik, E., Karapinar, N., and Cankurtaran, S. G.: Unsupervised learning of active-region nesting on the Sun, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-850, https://doi.org/10.5194/egusphere-egu23-850, 2023.

EGU23-2719 | Posters on site | ESSI1.3

Different types of PCA-NN model for TEC with space weather parameters as predictors: advantages and disadvantages of different NN algorithms 

Anna Morozova, Ricardo Gafeira, Teresa Barata, and Tatiana Barlyaeva

A PCA-NN model for the total electron content (TEC) for the midlatitudinal region (Iberian Peninsula) presented here uses the principal component analysis (PCA) to decompose TEC variations into different modes and to reconstruct/forecast amplitudes of these modes using neural networks (NN) with different sets of space weather parameters as predictors.

Feedforward, convolutional and recurrent NN algorithms are tested with different sets of predictors. The performance of the models is tested on 3.5 years of observational data obtained at the declined phase of the 24th solar cycle, which allows us to estimate the models’ performance in relation to the solar activity level. The advantages and disadvantages of different NN algorithms are discussed.

How to cite: Morozova, A., Gafeira, R., Barata, T., and Barlyaeva, T.: Different types of PCA-NN model for TEC with space weather parameters as predictors: advantages and disadvantages of different NN algorithms, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2719, https://doi.org/10.5194/egusphere-egu23-2719, 2023.

EGU23-2756 | ECS | Orals | ESSI1.3

SuNeRF: AI enables 3D reconstruction of the solar EUV corona 

Robert Jarolim, Benoit Tremblay, Andres Munoz-Jaramillo, Kyriaki-Margarita Bintsi, Anna Jungbluth, Miraflor Santos, James Paul Mason, Sairam Sundaresan, Cooper Downs, Ronald Caplan, and Angelos Vourlidas

To understand the solar evolution and effects of solar eruptive events, the Sun is permanently observed by multiple satellite missions. The optically-thin emission of the solar plasma and the limited number of viewpoints make it challenging to reconstruct the geometry and structure of the solar atmosphere; however, this information is the missing link to understand the Sun as it is: a three-dimensional, evolving star. We present a method that enables a complete 3D representation of the uppermost solar layer observed in extreme ultraviolet (EUV) light. We use a deep learning approach for 3D scene representation that accounts for radiative transfer, to map the entire solar atmosphere from three simultaneous observations. We demonstrate that our approach provides unprecedented reconstructions of the solar poles, and directly enables height estimates of coronal structures, solar flux ropes, coronal hole profiles, and coronal mass ejections. We validate the approach using model-generated synthetic EUV images, finding that our method accurately captures the 3D geometry even from a limited number of viewpoints. We quantify uncertainties of our model using an ensemble approach that allows us to estimate the model performance in absence of a ground-truth. Our method enables a novel view of our closest star, and is a breakthrough technology for the efficient use of multi-instrument datasets, which paves the way for future cluster missions.

How to cite: Jarolim, R., Tremblay, B., Munoz-Jaramillo, A., Bintsi, K.-M., Jungbluth, A., Santos, M., Mason, J. P., Sundaresan, S., Downs, C., Caplan, R., and Vourlidas, A.: SuNeRF: AI enables 3D reconstruction of the solar EUV corona, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2756, https://doi.org/10.5194/egusphere-egu23-2756, 2023.

EGU23-2897 | Orals | ESSI1.3

Automatic Classification of THEMIS All-Sky Images via Self-Supervised Semi-Supervised Learning 

Jeremiah Johnson, Dogacan Ozturk, Hyunju Connor, Donald Hampton, Matthew Blandin, and Amy Keesee

Dynamic interactions between the solar wind and the magnetosphere give rise to dramatic auroral forms that have been instrumental in the ground-based study of magnetospheric dynamics. The general mechanism of aurora types and their large-scale patterns are well-known, but the morphology of small- to meso-scale auroral forms observed in all-sky imagers and their relation to magnetospheric dynamics  and the coupling of the magnetosphere to the upper atmosphere remain in question. Machine learning has the potential to provide answers to these questions, but most existing auroral image data lack the ground-truth labels required for supervised learning and conventional statistical analyses. To mitigate this issue, we propose a novel self-supervised semi-supervised algorithm to automatically label the THEMIS all-sky image database. Specifically, we adapt the self-supervised Simple framework for Contrastive Learning of Representations (SimCLR) algorithm to learn latent representations of THEMIS all-sky images. These representations are finetuned using a small set of manually labeled data from the Oslo Aurora THEMIS (OATH) dataset, after which semi-supervised classification is used to train a classifier, beginning by training on the manually labeled OATH dataset and gradually incorporating the classifier’s most confident predictions on unlabeled data into the training dataset as ground-truth. We demonstrate that (a) classifiers fit to the learned representations of the manually labeled images achieve state–of–the–art performance, improving the classification accuracy by almost 10% over the current benchmark on labeled data; and (b) our model’s learned representations naturally cluster into more clusters than manually assigned categories, suggesting that existing categorizations are coarse and may obscure important connections between auroral types and their drivers. Finally, we introduce AuroraClick, a citizen science project with the goal of manually annotating a large representative sample of THEMIS all-sky images for the validation of our current models and the training of future models.  

How to cite: Johnson, J., Ozturk, D., Connor, H., Hampton, D., Blandin, M., and Keesee, A.: Automatic Classification of THEMIS All-Sky Images via Self-Supervised Semi-Supervised Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2897, https://doi.org/10.5194/egusphere-egu23-2897, 2023.

EGU23-3379 | ECS | Posters on site | ESSI1.3

Estimation and Prediction of Solar Wind Propagation from L1 Point to Earth’s Bow Shock 

Samira Tasnim, Ying Zou, Claudia Borries, Carsten Baumann, Brian Walsh, Krishna Khanal, Connor O'Brien, and Huaming Zhang

Having precise knowledge of the near-Earth solar wind (SW) and the embedded interplanetary magnetic field (IMF) is of critical importance to space weather operation due to the usage of SW and IMF in almost all magnetospheric and ionospheric models. The most widely used data source, OMNI, propagates SW properties from Lagrangian point L1 to the Earth’s bow shock by estimating the propagation time of the SW. However, the time difference between OMNI timeshifted IMF and the best match-up of IMF can reach ˜15 min. Firstly, we aim to develop an improved statistical algorithm to contribute to the SW propagation delay problem of space weather prediction. The algorithm focuses on matching SW features around the L1 point and upstream of the bow shock by computing the variance, cross-correlation coefficient, the plateau-shaped magnitude index, and the non-dimensional measure of average error index between the measurements at the two locations. The obtained propagation times are then compared to OMNI. Factors that limit the OMNI accuracy are also examined. Secondly, the automatic algorithm allows us to generate large sets of input and target variables using multiple spacecraft pairs at L1 and near-Earth locations to train, validate, and test machine learning models to specify and forecast near-Earth SW conditions. Finally, we offer a machine learning (ML) approach to specify and predict the propagation time from L1 monitors to a given location upstream or at the bow shock and forecast near-Earth SW conditions with the gradient boosting and random forest prediction models in the form of an ensemble of decision trees.

How to cite: Tasnim, S., Zou, Y., Borries, C., Baumann, C., Walsh, B., Khanal, K., O'Brien, C., and Zhang, H.: Estimation and Prediction of Solar Wind Propagation from L1 Point to Earth’s Bow Shock, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3379, https://doi.org/10.5194/egusphere-egu23-3379, 2023.

EGU23-4069 | ECS | Posters on site | ESSI1.3

Plasma-Sheet Bubble Identification Using Muitivariate Time Series Classification 

Feng Xuedong and Yang Jian

Abstract: Plasma-sheet bubbles play a major role in the process of magnetotail particle injections. They are defined as fast flows with reduced plasma density or pressure accompanied by magnetic field dipolarization. Typically, we can detect these bubbles from in-situ observations, but subjective uncertainty needs human verification. In this study, we combine three different methods including MINImally RandOm Convolutional KErnel Transform (MINIROCKET), 1D and 2D convolution neural network (CNN) to identify bubbles. The imbalanced training dataset consists of bubble and non-bubble events with a ratio of 1:40 from year 2007 to 2020. The results indicate that the accuracy of the all three models is around 99%, and the precision and recall rates of all three models are above 80% in both the validation and test datasets. The three methods are combined with the intersection set as the minimum set of predictions and the union set as the maximum set. The methods greatly reduce the number of false positives. To identify bubbles in the observations of year 2021, our neural network model is found to be comparably good to the traditional criterial and manual inspections. Using joint machine learning forecasting methods, we can easily and automatically identify bubbles without a priori knowledge like a domain expert.

Keywords: plasma-sheet bubble, multivariate time series classification, sample imbalanced, image identification

How to cite: Xuedong, F. and Jian, Y.: Plasma-Sheet Bubble Identification Using Muitivariate Time Series Classification, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4069, https://doi.org/10.5194/egusphere-egu23-4069, 2023.

EGU23-5254 | Posters on site | ESSI1.3

AI Assisted Data Selection of Laser Altimeter Observations 

Oliver Stenzel, Lukas Maes, and Martin Hilchenbach

Laser altimeters create large amounts of data that often have to be preprocessed and checked before further use. The BepiColombo mission to Mercury is set to arrive in December 2025 and observations with the BepiColombo Laser Altimeter (BELA, (Benkhoff et al., 2010; Thomas et al., 2021)) will start during the following spring. These measurements are planned to be used to derive information about the tides of Mercury (Thor et al., 2020). Careful assessment, selection, and filtering on the raw data is needed to extract the small tidal signal. Until the BELA data becomes available artificial data and records from other missions have to be used to study the data selection strategy. We present our work on MESSENGER Laser Altimeter (MLA, (Cavanaugh et al., 2007)) using a convolutional neural network to sort observations on an orbit by orbit basis into different classes. The already existing neural network (Stenzel and Hilchenbach, 2021; Stenzel, Thor and Hilchenbach, 2021) is tuned and a new test data set is created.

 

Benkhoff, J. et al. (2010) ‘BepiColombo—Comprehensive exploration of Mercury: Mission overview and science goals’, Planetary and Space Science, 58(1), pp. 2–20. Available at: https://doi.org/10.1016/j.pss.2009.09.020.

Cavanaugh, J.F. et al. (2007) ‘The Mercury Laser Altimeter Instrument for the MESSENGER Mission’, Space Science Reviews, 131(1), pp. 451–479. Available at: https://doi.org/10.1007/s11214-007-9273-4.

Stenzel, O. and Hilchenbach, M. (2021) ‘Towards machine learning assisted error identification in orbital laser altimetry for tides derivation’, pp. EPSC2021-688. Available at: https://doi.org/10.5194/espc2021-688.

Stenzel, O., Thor, R. and Hilchenbach, M. (2021) ‘Error identification in orbital laser altimeter data by machine learning’, pp. EGU21-14749. Available at: https://doi.org/10.5194/egusphere-egu21-14749.

Thomas, N. et al. (2021) ‘The BepiColombo Laser Altimeter’, Space Science Reviews, 217(1), p. 25. Available at: https://doi.org/10.1007/s11214-021-00794-y.

Thor, R.N. et al. (2020) ‘Prospects for measuring Mercury’s tidal Love number h2 with the BepiColombo Laser Altimeter’, Astronomy & Astrophysics, 633, p. A85. Available at: https://doi.org/10.1051/0004-6361/201936517.

 

How to cite: Stenzel, O., Maes, L., and Hilchenbach, M.: AI Assisted Data Selection of Laser Altimeter Observations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5254, https://doi.org/10.5194/egusphere-egu23-5254, 2023.

EGU23-6968 | ECS | Posters on site | ESSI1.3

Forecasting solar wind speed by machine learning based on coronal hole characteristics 

Daniel Collin, Stefano Bianco, Guillermo Gallego, and Yuri Shprits

One of the main sources of solar wind disturbances are coronal holes which can be identified in extreme ultra-violet (EUV) images of the Sun. Previous research has shown the connection between coronal holes and an increase of the solar wind speed at Earth. The time lag between the appearance of coronal holes on the visible side of the Sun and its effects on Earth is 2-5 days. In this study, a machine learning model predicting the solar wind speed originating from coronal holes is proposed. It is based on the analysis of solar EUV images. A segmentation algorithm is applied to the images in order to identify coronal holes and derive their characteristics (e.g. area, location). We also present a new method to calculate the geoeffective coronal hole area: Instead of specifying in advance a sector of the solar surface in which the area is measured and a lag time between area measurement and the arrival of the solar wind, the specification of this sector and the corresponding delay are formulated as a mathematical optimization problem and included in the machine learning model. This approach facilitates an improvement of the prediction accuracy and also prolongs the prediction horizon, as the solar wind speed can be predicted up to approximately 5 days in advance of the disturbance. Several machine learning model architectures are explored. We also study how the time evolution can be included in the model.

How to cite: Collin, D., Bianco, S., Gallego, G., and Shprits, Y.: Forecasting solar wind speed by machine learning based on coronal hole characteristics, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6968, https://doi.org/10.5194/egusphere-egu23-6968, 2023.

EGU23-7529 | ECS | Posters on site | ESSI1.3

Landform detection on Mars using image segmentation with a u-net convolutional neural network architecture 

Florian Auer-Welsbach, Andreas Windisch, and Giacomo Nodjoumi

The detection and classification of landforms on planetary surfaces is a time-consuming task which deeply relies on expert knowledge. Such a process can be partially automated and optimized in a resource-efficient way using image processing algorithms. By classifying the surface into different landforms, such as volcanic craters, asteroid impacts, dunes, and more, several analyses can be performed, for instance the widely used crater counting age estimation method. In addition, by conducting these analyses, information about the characteristics and properties of a planet can be revealed. One of the major challenges for the implementation of these algorithms is to provide a generalized model. In many cases the generalization error tends to be very large and therefore a satisfactory accuracy on the test data set cannot be accomplished. This prevents reliable evaluation of new unseen data. In this work, a multi-class image segmentation algorithm is presented, which is based on a U-net convolutional neural network architecture. U-nets classify each pixel of a given input image and can thus produce segmentation masks for various landforms. Given that enough labeled data is available, such a classifier can replace manual detection and classification, thereby saving resources by providing a fast method for landform detection.

How to cite: Auer-Welsbach, F., Windisch, A., and Nodjoumi, G.: Landform detection on Mars using image segmentation with a u-net convolutional neural network architecture, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7529, https://doi.org/10.5194/egusphere-egu23-7529, 2023.

EGU23-7761 | ECS | Posters virtual | ESSI1.3

Comparison study on the deep-learning-based detection of Mars craters 

Hind AlRiyami, Claus Gebhardt, and Christopher Lee

Deep-learning methods are of interest for the analysis of imagery and digital elevation models from Mars orbiting satellites. They detect various atmosphere and surface characteristics. For instance, these include dust storms and craters [1,2]. We approach this topic by using the deep-learning-based crater detection algorithm DeepMars2 [3,4]. The algorithm is applied to two digital elevation models (DEMs) of the Mars surface. The DEMs are based on the satellite instruments MOLA/MGS (Mars Orbiter Laser Altimeter/Mars Global Surveyor) and HRSC/MEX (High Resolution Stereo Camera/Mars Express) and have different resolution. Crater detection statistics are compared between both DEMs.

[1] Alshehhi, R., Gebhardt, C. Detection of Martian dust storms using mask regional convolutional neural networks. Prog Earth Planet Sci 9, 4 (2022). https://doi.org/10.1186/s40645-021-00464-1

[2] R. Alshehhi and C. Gebhardt, "Automated Geological Landmarks Detection on Mars Using Deep Domain Adaptation From Lunar High-Resolution Satellite Images," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2274-2283, 2022, doi: 10.1109/JSTARS.2022.3156371.

[3] Lee, C. (2019). Automated crater detection on Mars using deep learning. Planetary and Space Science, 170, 16-28. https://doi.org/10.1016/j.pss.2019.03.008

[4] Lee, C. & Hogan, J. (2021). Automated crater detection with human level performance. Computers & Geosciences, 147, 104645. https://doi.org/10.1016/j.cageo.2020.104645

How to cite: AlRiyami, H., Gebhardt, C., and Lee, C.: Comparison study on the deep-learning-based detection of Mars craters, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7761, https://doi.org/10.5194/egusphere-egu23-7761, 2023.

EGU23-7941 | ECS | Orals | ESSI1.3

Machine learning ensemble models for solar wind speed prediction 

Federico Sabbatini and Catia Grimani

Machine learning models trained to reproduce space mission observations are precious resources to fill gaps of missing data in measurement time series or to perform data forecasting within a reasonable uncertainty degree. The latter option is of particular importance for future space missions that will not host instrumentation dedicated to interplanetary medium parameter monitoring. The future LISA mission for low-frequency gravitational wave detection, for instance, will benefit of particle detectors to measure the galactic cosmic-ray integral flux variations and magnetometers that will allow to monitor the passage of large scale magnetic structures through the three LISA spacecraft as part of a diagnostics subsystem. Unfortunately, no instruments dedicated to solar wind speed measurements will be present on board the spacecraft constellation. Moreover, LISA, scheduled to launch in 2035, will trail Earth on the ecliptic at 50 million km distance, far from the orbits of other space missions dedicated to the interplanetary medium monitoring.

Based on precious lessons learned with LISA Pathfinder, the ESA LISA precursor mission, about the correlation between galactic cosmic-ray flux short-term variations and solar wind speed increases, we built a machine learning ensemble model able to reconstruct the solar wind trend only on the basis of contemporaneous and preceding observations of galactic cosmic-ray flux variations. Details about the model creation and performance will be presented, together with a description of the underlying data set, weak predictors and training phase. Advantages and limitations will be discussed, showing that the model performance may be enhanced by providing interplanetary magnetic field intensity observations as additional input data, with the goal of providing the LISA mission with an effective solar wind speed predictive tool.

How to cite: Sabbatini, F. and Grimani, C.: Machine learning ensemble models for solar wind speed prediction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7941, https://doi.org/10.5194/egusphere-egu23-7941, 2023.

EGU23-8430 | Orals | ESSI1.3

Modelling Jupiter's global and regional magnetic fields using physics-informed neural networks 

Longwei Chen, Phil Livermore, Leyuan Wu, Sjoerd de Ridder, and Chong Zhang

As is known, neural networks can universally approximate any complex functions. This ground truth naturally makes it a suitable candidate for solution representation of complex partial differential equation (PDE) governed. For planetary magnetic field modelling problem, spherical harmonic functions are most used as standard modelling method. Spherical harmonic method requires globally nearly uniformly distributed observations. Meanwhile this method has quite limited ability for conducting regional field modelling. Instead, neural networks have great potential to deal with global or regional modelling problems. In this work, we thoroughly investigate the representative ability of neural networks for magnetic field modelling problem at global and regional scale, and concentrate on a specific neural network, that is physics-informed neural networks (PINNs) for implementation. PINNs makes it easier to incorporate different kinds of informed physics within a uniform optimization framework. Through synthetic model tests and partial mathematical proof, we showcase the importance of employing natural boundary condition, Laplace equation constraint and Poisson equation constraint at suitable collocation points for a reasonable and accurate magnetic field representation and introduce the detailed scheme for implementation. Finally, we use newly released Juno mission measurements, and present a global PINNs model for Jupiter's magnetic field, and a regional PINNs model for Great Blue Spot (GBS) region. Comparison with spherical harmonic model has been conducted to evaluate the correctness and flexibility of PINNs models.

How to cite: Chen, L., Livermore, P., Wu, L., de Ridder, S., and Zhang, C.: Modelling Jupiter's global and regional magnetic fields using physics-informed neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8430, https://doi.org/10.5194/egusphere-egu23-8430, 2023.

To accurately predict potential future impacts with the Earth, it is crucial to continuously examine the area around it for Near Earth Objects (NEOs) and particularly Near Earth Asteroids (NEAs). Large data sets of astronomical images must be analyzed in order to accomplish this task. NEARBY [1] offers such a processing and analysis platform based on Cloud computing. Despite the fact that this method is automated, the results are validated by human observers after potential asteroids have been identified from the raw data. It is crucial that the amount of candidate objects does not outweigh the available human resources. We believe we can maximize the advantages of having access to enormous amounts of data in the field of astronomy by combining artificial intelligence with the use of high-performance distributed processing infrastructures like Cloud-based solutions. This research is carried out as part of the CERES project which aims to design and put into practice a software solution that can classify objects found in astronomical images. The objective is to identify and recognize asteroids. We use machine learning techniques to develop an asteroid classification model in order to achieve this goal. It is essential to reduce the number of false negative findings. The major objective of the current paper is to assess how well deep CNNs perform when it comes to categorizing astronomical objects, particularly asteroids. We will compare the outcomes of several of the most well-known deep convolutional neural networks (CNNs), including InceptionV3, Xception, InceptionResNetV2, and ResNet152V2. These cutting-edge classification CNNs are used to investigate the best approach to this specific classification challenge, either through full-training or through fine-tuning.

Acknowledgment: This work was partially supported by a grant of the Romanian Ministry of Education and Research, CCCDI - UEFISCDI, project number PN-III-P2-2.1-PED-2019-0796, within PNCDI III. This research was partially supported by the project 38 PFE in the frame of the programme PDI-PFE-CDI 2021.

References:

1. Bacu, V., Sabou, A., Stefanut, T., Gorgan, D., Vaduvescu, O., NEARBY platform for detecting asteroids in astronomical images using cloud-based containerized applications, 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 371-376

How to cite: Bacu, V.: Software solution for detecting asteroids using machine learning techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8676, https://doi.org/10.5194/egusphere-egu23-8676, 2023.

To monitor the results of our instrument on a daily basis, we create a series of daily plots that are generated in an automated fashion.  In our case, we are creating two plot types for four spacecraft for five different species.  Unfortunately, due to circumstances beyond our control (primarily network and system issues), plots were failing and if not monitored daily, they were unavailable when finally needed.  

To solve this problem, we investigated using Computer Vision (OpenCV) to validate our generation of daily plots.  It was surprisingly easy and more advantageous than trying to either monitor it daily or more simplistic methods.  By using the cloud, we were able to improve throughput as well.  Future work would be to use Computer Vision to analyze the data within the plots for actual scientific study.

How to cite: Mukherjee, J.: Using Artificial Intelligence/Computer Vision for Automated Plot Validation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8946, https://doi.org/10.5194/egusphere-egu23-8946, 2023.

EGU23-10654 | ECS | Orals | ESSI1.3

Predicting the 1 AU Arrival Time of Coronal Mass Ejections Based on Convolutional Neural Network 

Yi Yang, Fang Shen, Yucong Li, and Rongpei Lin

Coronal mass ejections (CMEs) are one of the most violent solar eruptions, which can burst out large amounts of magnetized plasma with speeds up to thousands of kilometers per second. When it reaches the Earth, a CME can cause geomagnetic storm, affecting aviation safety, satellite operations, communications systems and power facilities. Therefore, fast and accurate prediction of CME arrival time is crucial for avoiding severe damaging effects and reducing economic losses. The initial morphology and kinematics of a CME in the corona can be observed by the coronagraphs equipped on the Solar and Heliospheric Observatory (SOHO), so that the coronagraphs should be useful to predict the CME arrival times. In this study, convolutional neural network (CNN) is used to obtain the features of SOHO/LASCO coronagraph pictures related to the CME transit time, and establish a model capable of predicting the CME arrival time. The influence of different hyperparameters of CNN on the prediction results is studied. Further, we add a physical information constraint of the initial velocities of CME to the basic CNN outputs, and found that smaller prediction errors can be obtained. 

How to cite: Yang, Y., Shen, F., Li, Y., and Lin, R.: Predicting the 1 AU Arrival Time of Coronal Mass Ejections Based on Convolutional Neural Network, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10654, https://doi.org/10.5194/egusphere-egu23-10654, 2023.

Solar eruptive events are complex phenomena, which most often include solar flares, filament eruptions, coronal mass ejections (CMEs), and CME-driven shock waves. CME-driven shocks in the corona and interplanetary space are considered to be the main producer of solar energetic particles (SEPs). A number of fundamental questions remain about how SEPs are produced. Current understanding points to CME-driven shocks and compressions in the solar corona.

A CME kinematics shows three phases - an initial rising phase (weakly accelerated motion), an impulsive phase and a residual propagation phase with constant or decreasing speed.

Despite significant amount of data available from ground-based (COSMO K-Cor, LOFAR) and remote instruments onboard of heliospheric space missions (SDO AIA, SOHO), processing of the data still requires noticeable effort. Most algorithms currently used in solar feature detection and tracking are known for their limited applicability and complexity of their processing chains, while usage of data-driven approaches for tracking of CME-related phenomena is currently limited due to insufficiency of training sets.

Recently (Stepanyuk et.al, J. Space Weather Space Clim. Vol 12, 20(2022)), we have demonstrated the method and the software(https://gitlab.com/iahelio/mosaiics/wavetrack) for smart characterization and tracking of solar eruptive features based on the a-trous wavelet decomposition technique, intensity rankings and a set of filtering techniques. In this work we use Wavetrack to generate training sets for data-driven feature extraction and characterization. We utilize U-Net, a fully convolutional network which training strategy relies on the strong use of data augmentation to use the available annotated samples more efficiently. U-NET can be trained end-to-end from a very limited set of images, while feature engineering allows to improve this approach even further by expanding available training sets.

Here we present pre-trained models and demonstrate data-driven characterization and tracking of solar eruptive features on a set of CME-events.

How to cite: Stepanyuk, O. and Kozarev, K.: Advanced Multi-Instrument and Multi-Wavelength Image Processing and Feature Tracking for Remote CME Characterization with Convolutional Neural Network, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10705, https://doi.org/10.5194/egusphere-egu23-10705, 2023.

EGU23-11898 | ECS | Orals | ESSI1.3

Composition Analysis of an Apatite Crystal using a Space-Prototype Mass Spectrometric Instrument and Machine Learning for Unsupervised Mineralogical Phase Detection 

Salome Gruchola, Marek Tulej, Peter Keresztes Schmidt, Rustam Lukmanov, Andreas Riedo, and Peter Wurz

We present the analysis of a 2.06 Ga apatite crystal obtained from an ultramafic phoscorite rock from the Phalaborwa Complex (Limpopo Province, South Africa) [1]. A space-prototype laser ablation ionisation mass spectrometer (LIMS) [2,3] was used to study the chemical composition of the sample. Mass spectra were recorded from a sample area of 0.6x0.6 mm2, with a spatial resolution of 30 μm and sub-micrometre depth resolution.

Apatite is a calcium phosphate mineral expressed by the chemical stoichiometric formula [Ca5(PO4)3(F, Cl, OH)]. The halogen site, occupied by F, Cl, and OH, corresponds to an isomorpous series with fluor-, chlor- and hydroxyl-apatite end members, respectively. Apatite, being an accessory mineral in igneous and other rocks, commonly contains a range of other elements that do not fit well into the major rock forming minerals, such as rare earth elements (REE). These are suitable targets for investigating physical and chemical conditions in igneous rocks and the volatile evolution of magmas.

The analysis of the spectra recorded with our LIMS system for the abundances of the elements of interest at each location were performed in two steps. First, the abundances of each element across the sampled area were compiled in element maps. And second, an unsupervised machine learning algorithm based on clustering and network analysis was applied to the data set of analysed mass spectra to separate it into groups of distinct chemical composition. Subsequently, a more detailed analysis was conducted on each of the recovered groups to assign the corresponding mineral. In addition to the group of spectra belonging to apatite, which was assigned to fluorapatite, other minerals were identified, amongst others olivine. This method yields an unsupervised approach to identify different mineralogical entities present within a sample. This network analysis method was previously applied to a 1.88 Ga Gunflint sample (Ontario, Canada) to separate spectra recorded from the host (chert) from spectra containing signatures of organic matter from fossilized microbes [4].

Given that the data were recorded using a miniature mass spectrometer designed for space flight, this analysis demonstrates the analytical capabilities of our LIMS system that could be achieved in-situ on other planetary bodies in our Solar System, for example on the Moon or on Mars. The current performance of this miniature LIMS instrument to study the chemical composition of apatite is sufficiently high to measure volatiles (H, F, Cl) and nearly all relevant mineral and partially trace elements (Na, C, Mg, Si, S, K, Mn, Fe, Sr, Ba), including REE (La, Ce, Pr, Sm) which allows for a systematic quantitative analysis of their distribution.

[1] Tulej, M. et al., 2022, https://doi.org/10.3390/universe8080410.

[2] Riedo, A. et al., 2012, https://doi.org/10.1002/jms.3104.

[3] Tulej, M. et al., 2021, https://doi.org/10.3390/app11062562.

[4] Lukmanov, R.A. et al., 2022, https://doi.org/10.3389/frspt.2022.718943

How to cite: Gruchola, S., Tulej, M., Keresztes Schmidt, P., Lukmanov, R., Riedo, A., and Wurz, P.: Composition Analysis of an Apatite Crystal using a Space-Prototype Mass Spectrometric Instrument and Machine Learning for Unsupervised Mineralogical Phase Detection, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11898, https://doi.org/10.5194/egusphere-egu23-11898, 2023.

The origin of cold materials identified by different criteria is unclear. They are highly suspected to be the erupted prominence. However, some cold materials defined by charge depletion exist in both solar wind and ICMEs. Recently, solar observations show failed prominence eruption in CMEs that it did not propagate into the interplanetary space. Besides, the related prominence eruptions of the earth-directed ICMEs at 1 au are difficult to identify before the launch of STEREO mission. This work uses Random Forest (RF) that is an interpretable classifier of supervised machine learning to study the distinct signatures of prominence cold materials (PCs) compared to quiet solar wind (SW) and ICMEs. 12 parameters measured by ACE at 1 au are used in this study, which are proton moments, magnetic field component Bz, He/H, He/O, Fe/O, mean charge of oxygen and carbon, C6+/C5, C6+/C4+, and O7+/O6+. According to the returned weights from RF classifier and the training accuracy from one black box classifier, the most important in situ signatures of PCs are obtained. Next, the trained RF classifier is used to check the category of the origin-unknown cold materials in ICMEs. The results show that most of the cold materials are from prominence, but 2 of them are possibly from quiet solar wind. The most distinct signatures of PCs are lower charges of C and O, proton temperature, and He/O. This work provides quantitative evidence for the charges of C and O being most effective solid criteria. Considering the obvious overlaps on key parameters between SW, ICMEs, and PCs, multi-parameter classifier of machine learning show an advantage in separating them than solid criteria.

How to cite: Meng, S., Yao, S., and Cheng, Z.: Key Signatures of Prominence Materials and Category of Unknown-origin Cold Materials identified by Machine Learning Classifier, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12354, https://doi.org/10.5194/egusphere-egu23-12354, 2023.

EGU23-14927 | ECS | Orals | ESSI1.3

Automatically Calculating Depths of Martian and Lunar Pits with Satellite Imagery 

Daniel Le Corre, Nigel Mason, Jeronimo Bernard-Salas, Nick Cox, and David Mary

Pits, or pit craters, are roughly circular depressions found in planetary surfaces which are generally formed through gravitational collapse. Pits will be primary targets for future space exploration and habitability for their presence on most rocky Solar System surfaces and their potential to be entrances to sub-surface cavities. This is particularly true on the Moon and Mars where future astronauts will also be exposed to high radiation dosages whilst on the surface. However, since pits are rarely found to have corresponding high-resolution elevation data, tools are required for approximating their depths in order to find those which are the ideal candidates for exploration and habitation.

We develop a tool that automatically calculates a pit’s apparent depth – the depth at the edge of its shadow - by measuring the shadow’s width as it appears in satellite imagery. The tool can produce a profile of the apparent depth along the entire length of the shadow, using just one cropped single- or multi-band image of a pit. Thus, allowing for the search for possible cave entrances to continue where altimetry or stereo image data is not available. Shadows are automatically extracted using k-means clustering with silhouette analysis for automatic cluster validation. We will present the results of testing the shadow extraction upon shadow-labelled Mars Reconnaissance Orbiter HiRISE imagery of Martian pits, as well as the findings of applying the tool to HiRISE images of Atypical Pit Craters (APCs) from the Mars Global Cave Candidate Catalog (MGC3) [1]. We will also present preliminary results of applying our tool to Lunar Reconnaissance Orbiter Narrow Angle Camera data taken of Lunar pits catalogued in the Lunar Pit Atlas [2].

[1] – Cushing et al. (2015). Atypical pit craters on Mars: New insights from THEMIS, CTX, and HiRISE observations, Journal of Geophysical Research: Planets, 120, 1023–1043

[2] – Wagner & Robinson (2021). Occurrence and Origin of Lunar Pits: Observations from a New Catalog, in 52nd Lunar and Planetary Science Conference, Lunar and Planetary Science Conference, p. 2530

How to cite: Le Corre, D., Mason, N., Bernard-Salas, J., Cox, N., and Mary, D.: Automatically Calculating Depths of Martian and Lunar Pits with Satellite Imagery, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14927, https://doi.org/10.5194/egusphere-egu23-14927, 2023.

EGU23-15160 | ECS | Orals | ESSI1.3

Detecting the magnetopause of Mercury by neural network — using MESSENGER data to train for BepiColombo. 

Lukas Maes, Markus Fraenz, and Daniel Heyner

The BepiColombo mission will arrive at Mercury in 2025. It consists of two spacecraft, which both have a magnetometer on board. One of the science objectives of these instruments is to study the structure of Mercury’s magnetosphere and its dynamical interaction with the solar wind. To study this statistically, a large dataset of observations of the magnetopause (the magnetosphere’s outer boundary) is needed. However, identifying such magnetopause crossings in magnetic field data requires visual inspection by humans with expert knowledge and as such is a very time consuming process. We therefore design an algorithm to automatically detect the Hermean magnetopause in magnetometer time series data, making use of a convolutional neural network.

Since no BepiColombo data (in orbit) is available yet, we train the network on MESSENGER magnetometer data. However, we formulate the problem and design the architecture of the network in such a way that the algorithm should be easily transferable to BepiColombo magnetometer data, avoiding the possible impact of any instrumental particularities or orbital biases.

The goal is to have a neural network which is directly applicable to BepiColombo magnetometer data, as soon as the observations start and without any further training, thereby eliminating the necessity of manually creating a new dataset of BepiColombo magnetopause crossings.

How to cite: Maes, L., Fraenz, M., and Heyner, D.: Detecting the magnetopause of Mercury by neural network — using MESSENGER data to train for BepiColombo., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15160, https://doi.org/10.5194/egusphere-egu23-15160, 2023.

EGU23-16941 | ECS | Orals | ESSI1.3

Mars Perseverance Panoramic Image for Self-Determination Mission Algorithm 

Okta Bramantio Swida, Bernard Foing, and Constantijn Vleugels

Aiming to unravel the astrobiology of Mars, the Perseverance mission came with a lot of unknowns. With the surface level knowledge that we have already known, The High Resolution Imaging Science Experiment (HiRISE) can already determine the observation or experimental sites through the images generated from the orbiter. Although the resolution is high, with the power of a 1-meter-size object determinator, we can always expect so much more from the ground-level observation.

 

The Mars Perseverance rover is equipped with a pair of Mastcam-Z set cameras that are equipped in a manner to simulate the human eye for depth determination in image processing. The instruments can process stereo colour images of the ground level. These images can be used to make detailed maps of the Mars surface scenery at ground level with high precision.

 

Building and analyzing these images can take days to process on Earth manually. But if we utilise machine learning tools and onsite computation, it might save a lot of time for the mission. The current model used in the Mars Perseverance is the AutoNav Mark 4 with a lot of tasks, including spacecraft positioning, in-flight orbit determination, target tracking, and ephemeris calculations. All those might be computationally expensive to process. Therefore, the aim of this research is to develop a simple algorithm to do object and slope determinations to feed into an autonomous path determination process. The data fed into the algorithm are panoramic images captured by the MastCam-Z mounted on Mars Perseverance.

How to cite: Swida, O. B., Foing, B., and Vleugels, C.: Mars Perseverance Panoramic Image for Self-Determination Mission Algorithm, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16941, https://doi.org/10.5194/egusphere-egu23-16941, 2023.

EGU23-2909 | Posters on site | ESSI1.5

Long-Term Forecasting of Environment variables of MERRA2 based on Transformers 

Tsengdar Lee, Sujit Roy, Ankur Kumar, Rahul Ramachandran, and Udaysankar Nair

Transformers in general have shown great promise in the sequence modeling. Recently proposed vision transformer (ViT) by Dosovitskiy et al. has shown optimal performance in image recogining [1]. Fourier Neural operator based token mixer transformers keeping ViT as backbone was proposed by Guibas et. al. has been used for predicting wind and precipitation on ERA5 dataset[2,3]. Following the previous work, we trained the Fourcastnet from scratch on the MERRA2 data set with 3 verticle levels (z450, z500, z550) and 11 variables (adding u, v, and temp). We trained on data from 2005 to 2015 and made predictions by providing the initial conditions from 2017. The prediction was made for 7 days in advance. For the first 24 hours model prediction, mean correlation was 0.998. Root mean squared error (RMSE) 6 hours prediction was 8.779 and for 24 hours was 19.581 on a scale range of -575.6 to 330.6. The model was further tested on 11 variables on the same training data to evaluate prediction of major events like Hurricane. Initial condition for category 5 Hurricane Sep 28, 2016 – Oct 10, 2016 was given to the model. The model was able to predict the hurricane for 18 hours. Further work will be done in order to tune to model and increase more environment variables from MERRA2 to make the prediction more robust and for a longer period.

References:
1. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold,
G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv
preprint arXiv:2010.11929.
2. Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A. and Catanzaro, B., 2021, September. Efficient Token Mixing for
Transformers via Adaptive Fourier Neural Operators. In International Conference on Learning Representations
3. Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z.,
Azizzadenesheli, K. and Hassanzadeh, P., 2022. Fourcastnet: A global data-driven high-resolution weather model using
adaptive fourier neural operators. arXiv preprint arXiv:2202.11214.

How to cite: Lee, T., Roy, S., Kumar, A., Ramachandran, R., and Nair, U.: Long-Term Forecasting of Environment variables of MERRA2 based on Transformers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2909, https://doi.org/10.5194/egusphere-egu23-2909, 2023.

EGU23-2944 | Orals | ESSI1.5

Foundation AI Models for Science 

Manil Maskey, Rahul Ramachandran, Tsengdar Lee, and Raghu Ganti

Foundation Models (FM) are AI models that are designed to replace a task or an application specific model. These FM can be applied to many different downstream applications. These FM are trained using self supervised techniques and can be built on any type of sequence data. The use of self supervised learning removes the hurdle for developing a large labeled dataset for training. Most FM use transformer architecture utilizes the notion of self attention which allows the network to model the influence of distant data points to each other both in space and time. The FM models exhibit emergent properties that are induced from the data.

 

FM can be an important tool for science. The scale of these models results in better performance for different downstream applications and these applications show better accuracy over models built from scratch. FM drastically reduces the cost of entry to build different downstream applications both in time and effort. FM for selected science datasets such as optical satellite data, can accelerate applications ranging from data quality monitoring, feature detection and prediction. FM can make it easier to infuse AI into scientific research by removing the training data bottleneck and increasing the use of science data.

How to cite: Maskey, M., Ramachandran, R., Lee, T., and Ganti, R.: Foundation AI Models for Science, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2944, https://doi.org/10.5194/egusphere-egu23-2944, 2023.

EGU23-5443 | Orals | ESSI1.5

Earth System Deep Learning towards a Global Digital Twin of Wildfires 

Ioannis Prapas, Ilektra Karasante, Akanksha Ahuja, Spyros Kondylatos, Eleanna Panagiotou, Charalampos Davalas, Lazaro Alonso, Rackhun Son, Michail Dimitrios, Nuno Carvalhais, and Ioannis Papoutsis

Due to climate change, we expect an exacerbation of fire in Europe and around the world, with major wildfire events extending to northern latitudes and boreal regions [1]. In this context, it is important to improve our capabilities to anticipate fire danger and understand its driving mechanisms at a global scale. As the earth is an interconnected system, large-scale processes can have an effect on the global climate and fire seasons. For example, extreme fires in Siberia have been linked to previous-year surface moisture conditions and anomalies in the Arctic Oscillation [2]. As part of the ESA-funded project SeasFire (https://seasfire.hua.gr), we gather and harmonize data related to seasonal fire drivers and develop deep learning models that are able to capture spatiotemporal associations with the goal to forecast burned area sizes on a seasonal scale, globally. We publish a global analysis-ready datacube for seasonal fire forecasting for the years 2001-2021 at a spatiotemporal resolution of 0.25 deg x 0.25 deg x 8 days [3]. The datacube includes a combination of variables describing the seasonal fire drivers , namely climate, vegetation, oceanic indices, human factors, land cover and the burned areas. We leverage the availability of big EO data and the advances in Deep Learning modeling [4, 5] to forecast global burned areas, capture the spatio-temporal interactions of the Earth System variables and identify potential teleconnections that determine wildfire regimes under the light of climate change. We present deep learning models that handle the Earth as a system, such as graph neural networks and transformer-based architectures. Applied on the prediction of wildfires at different temporal horizons we reveal that our deep learning models  skillfully predict burned area patterns. Exploring the explanation of the models, we reveal important spatio-temporal links.

Our approach, using AI to model the earth as a system and capture long spatio-temporal interactions, showcases the potential of an application-specific digital twin. The SeasFire datacube can be exploited as a baseline digital twin for modeling different natural hazards, including floods, heatwaves, and droughts. Thus, we will discuss insights and future directions for digital twins in anticipating climate extremes, inspired by our global wildfire prediction paradigm. 

 

[1] Wu, Chao, et al. "Historical and future global burned area with changing climate and human demography." One Earth 4.4 (2021): 517-530.

[2] Kim, Jin-Soo, et al. "Extensive fires in southeastern Siberian permafrost linked to preceding Arctic Oscillation." Science advances 6.2 (2020): eaax3308.

[3] Alonso, Lazaro, et al. Seasfire Cube: A Global Dataset for Seasonal Fire Modeling in the Earth System. Zenodo, 15 July 2022, p., doi:10.5281/zenodo.6834584.

[4] Kondylatos, Spyros et al. “Wildfire Danger Prediction and Understanding with Deep Learning.” Geophysical Research Letters, 2022.  doi: 10.1029/2022GL099368

[5] Prapas, Ioannis et al. “Deep Learning for Global Wildfire Forecasting.” NeurIPS 2022 workshop on Tackling Climate Change with Machine Learning, doi:  10.48550/arXiv.2211.00534

How to cite: Prapas, I., Karasante, I., Ahuja, A., Kondylatos, S., Panagiotou, E., Davalas, C., Alonso, L., Son, R., Dimitrios, M., Carvalhais, N., and Papoutsis, I.: Earth System Deep Learning towards a Global Digital Twin of Wildfires, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5443, https://doi.org/10.5194/egusphere-egu23-5443, 2023.

EGU23-5746 | Orals | ESSI1.5

Confidence estimation of DNN predictions for on-board applications 

Nicolas Dublé, François De Vieilleville, Adrien Lagrange, and Bertrand Le Saux

Most of the DNNs are designed to predict a class, a segmentation map or detections, no matter it is interpolation or extrapolation. Then, a confidence score answers to the need of having interpretable outputs and it could help an AI4EO end-user to take a decision.

The first investigated use case was binary classification of small Sentinel-2 tiles containing ships or not (with 2 classes “tile containing ship” or “tile not containing ship”). The database gathered 16,947 small 140x140 tiles extracted from 37 Sentinel-2 products. The ground truth was generated using Danish AIS data and then checked by human-eye. It was divided into several datasets for training, validation, testing, and active learning.

The second investigated use case was the classification of 10 geophysical phenomena from Sentinel-1 wave mode [Wang et al, 2018]. The database gathered 30,032 images with a quite balanced repartition between the 10 classes.

Classification networks (VGG16) were trained on the training datasets of both use cases, reaching high performances (>95% accuracy). We added several Out Of Distribution (OOD) examples for the ship classification use case, and used the test database provided for the Ocean Features use cases. Models reach around 70% accuracy on these 2 harder datasets so that regressing confidence could have an interest, with many examples of wrong classifications.

The solution developed used the confidNet approach developed by Corbière et al. Without retraining the classification DNN, we added a second DNN, composed by several dense layers, taking latent space from the classification network as input, which objective was to estimate a confidence score, by trying to approach the True Class Probability. It proved to be easy to train when enough failure examples are available in the database.

The main objective of the confidNet is to find the “ID”/”OOD” boundary, qualifying which examples the classifier should be able to predict (interpolation), and those it should fail to predict (extrapolation). An important work was done to try to qualify the quality of the predictions of the confidNet (confidence score) to ensure that it didn’t just learn to map the subset of the dataset where the classifier fails, and the one where the classifier was right. It presented interesting properties of generalization and turned out to be less “dataset-dependent” than a classical DNN.

21 different network configurations were tested, making the size of the architecture vary from 4k to 2.5M parameters. It showed that many of these configurations could reach similar results, and that the number of layers was more decisive than the number of parameters of the intermediate feature maps.

The main results obtained in this study are the relevance to utilize the confidNet approach in AI4EO scenarios, the possibility to reduce the network in an on-boarding interest, and a first warranty that the confidNet approach can learn in a different way from classification networks, with interesting properties of generalization. This study demonstrates the possibility to associate confidence scores to the predictions of a DNN in a satisfying way.

How to cite: Dublé, N., De Vieilleville, F., Lagrange, A., and Le Saux, B.: Confidence estimation of DNN predictions for on-board applications, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5746, https://doi.org/10.5194/egusphere-egu23-5746, 2023.

EGU23-6060 | ECS | Posters on site | ESSI1.5

A machine learning-powered Digital Twin for extreme weather events analysis 

Gabriele Accarino, Donatello Elia, Davide Donno, Francesco Immorlano, and Giovanni Aloisio

In recent years, Climate Change has been leading to an exacerbation of Extreme Weather Events (EWEs), such as storms and wildfires, raising major concerns in terms of their increase of their intensity, frequency and duration. Detecting and predicting EWEs is challenging due to the rare occurrence of these events and consequently the lack of related historical data. Additionally, gathering of data when the event manifests is not a straightforward process, due to the intrinsic difficulty of positioning and using acquisition systems. Advances in Machine Learning (ML) can provide cutting-edge modeling techniques to deal with EWE detection and prediction tasks, offering cost-effective and fast-computing solutions which are strongly required by policy makers for taking timely and informed actions in the presence of EWEs.

Solutions based on ML could, thus, support studies of such extreme events, providing scientists, policy makers and also the general public with powerful and innovative data-driven tools. However, from an infrastructural point of view, supporting such types of applications requires a wide set of integrated software components including data gathering and harmonisation pipelines, data pre-processing and augmentation modules, computing platforms for model training, results visualization tools, etc.

A Digital Twin for the analysis of extreme weather events, focusing on storms and wildfires, is being developed in the context of the EU-funded InterTwin project. The InterTwin project aims at defining a Digital Twin Engine for supporting scientific applications from different fields. In particular, for the EWEs, neural networks are being adopted as modeling tools capable of learning the underlying mapping between drivers and outcomes from past data and generalizing it to future projection data. This contribution will present the early concept behind the design of this machine learning-powered Digital Twin for EWE studies.

How to cite: Accarino, G., Elia, D., Donno, D., Immorlano, F., and Aloisio, G.: A machine learning-powered Digital Twin for extreme weather events analysis, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6060, https://doi.org/10.5194/egusphere-egu23-6060, 2023.

EGU23-8777 | ECS | Orals | ESSI1.5

Deep Learning for Verification of Earth's surfaces 

Margarita Choulga, Tom Kimpson, Matthew Chantry, Gianpaolo Balsamo, Souhail Boussetta, Peter Dueben, and Tim Palmer

Ever increasing computing capabilities and crave for high-resolution numerical weather prediction and climate information are specially interesting for the representation of Earth surfaces. Knowledge of accurate and up-to-date surface state for ecosystems such as forest, agriculture, lakes and cities strongly influence the skin temperatures, turbulent latent and sensible heat fluxes providing the lower boundary conditions for energy and moisture availability near the surface. A quick and automatic tool to assess the benefits of updating different surface fields, that makes use of a neural network regression model trained to simulate satellite observed surface skin temperatures, was developed. This tool was deployed to determine the accuracy of several global datasets for lakes, forest, and urban distributions. Comparison results will be shown. The neural network regression model has proven to be useful and easily adaptable to assess unforeseen impacts of ancillary datasets, also detecting erroneous regional areas over the globe, proving to be a valuable support to model development. 

How to cite: Choulga, M., Kimpson, T., Chantry, M., Balsamo, G., Boussetta, S., Dueben, P., and Palmer, T.: Deep Learning for Verification of Earth's surfaces, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8777, https://doi.org/10.5194/egusphere-egu23-8777, 2023.

EGU23-10092 | Posters on site | ESSI1.5

An Information Management Framework for Environmental Digital Twins (IMFe)  as a concept and pilot 

Justin Buck, Andrew Kingdon, John Siddorn, Gordon Blair, Alexandra Kokkinaki, John Blower, Matt Fry, Ben Marchant, Sam Pepler, John Watkins, and James Byrne

Environmental science is concerned with assessing the impacts of changing environmental conditions upon the state of the natural world. Environmental Digital Twins (EDT) are a new technology that enable environmental change scenarios for real systems to be modelled and their impacts visualised. They will be particularly effective with delivering understanding of these impacts on the natural environment to non-specialist stakeholders.

The UK Natural Environment Research Council (NERC) recently published its first digital strategy, which sets out a vision for digitally enabled environmental science for the next decade. This strategy places data and digital technologies at the heart of UK environmental science.

EDT have been made possible by the emergence of increasingly large, diverse, static data sources, networks of dynamic environmental data from sensor networks and time-variant process modelling. Once combined with visualisation capabilities these provide the basis of the digital twin technologies to enable the environmental scientists community to make a step-change in understanding of the environment. Components may be developed separately by a network but can be combined to improve understanding provided development follows agreed standards to facilitate data exchange and integration.

Replicating the behaviours of environmental systems is inevitably a multi-disciplinary activity. To enable this, an information management framework for Environmental digital twins (IMFe) is needed that establishes the components for effective information management within and across the EDT ecosystem. This must enable secure, resilient interoperability of data, and is a reference point to facilitate data use in line with security, legal, commercial, privacy and other relevant concerns. We present recommendations for developing an IMFe including the application of concepts such as an asset commons and balanced approach to standards to facilitate minimum interoperability requirements between twins while iteratively implementing an IMFe. Achieving this requires components to be developed that follow agreed standards to ensure that information can be trusted by the user, and that they are semantically interoperable so data can be shared. A digital Asset Register will be defined to provide access to and enable linking of such components.

This previously conceptual project has now been enhanced into the Pilot IMFe project aiming to define the architectures, technologies, standards and hardware infrastructure to develop a fully functioned environmental digital twin. During the project lifespan this will be tested with by construction of a pilot EDT for the Haig Fras Marine Conservation Zone (MCZ) that both enables testing of the proposed IMFe concepts and will provide a clear demonstration of the power of EDT to monitor and scenario test a complex environmental system for the benefit of stakeholders. 

How to cite: Buck, J., Kingdon, A., Siddorn, J., Blair, G., Kokkinaki, A., Blower, J., Fry, M., Marchant, B., Pepler, S., Watkins, J., and Byrne, J.: An Information Management Framework for Environmental Digital Twins (IMFe)  as a concept and pilot, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10092, https://doi.org/10.5194/egusphere-egu23-10092, 2023.

EGU23-10138 | Orals | ESSI1.5 | Highlight

Open-Source Framework For Earth System Digital Twins Applied to Surface Water Hydrology 

Thomas Huang and the NASA AIST IDEAS and SCO FloodDAM Teams

An Earth System Digital Twin (ESDT) is a dynamic, interactive, digital replica of the state and temporal evolution of Earth systems. It integrates multiple models along with observational data, and connects them with analysis, AI, and visualization tools. Together, these enable users to explore the current state of the Earth system, predict future conditions, and run hypothetical scenarios to understand how the system would evolve under various assumptions. The NASA Advanced Information Systems Technology (AIST) program’s Integrated Digital Earth Analysis System (IDEAS) project partners with the Space for Climate Observatory (SCO) (https://www.spaceclimateobservatory.org/) FloodDAM Digital Twin effort led by CNES to establish an extensible open-source framework to develop digital twins of our physical environment for Earth Science with an initial focus on surface water hydrology in Earth’s rivers and lakes. The joint effort delivers an open-source system architecture with mechanisms for the outputs of one model to feed into others, for driving models with observation data, and for harmonizing observation data and model outputs for analysis. Water resource science is multidisciplinary in nature, and it not only assesses the impact from our changing climate using measurements and modeling, but it also offers opportunities for science-guided, data-driven decision support. The joint effort uses flood prediction and analysis as its primary use case. The work presents a multi-agency joint effort to define and develop a federated Earth System Digital Twin solution between NASA and CNES that powers advanced immersive science and custom user applications for scenario-based analysis.

How to cite: Huang, T. and the NASA AIST IDEAS and SCO FloodDAM Teams: Open-Source Framework For Earth System Digital Twins Applied to Surface Water Hydrology, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10138, https://doi.org/10.5194/egusphere-egu23-10138, 2023.

EGU23-10488 | ECS | Orals | ESSI1.5

Statistical downscaling of precipitation with deep neural networks 

Bing Gong, Yan Ji, Michael Langguth, and Martin Schultz

Accurate weather predictions are essential for many aspects of social society. Providing a reliable high-resolution precipitation field is essential to capture the finer scale of heavy precipitation events,  which is normally poorly represented in the numerical models. Statistical downscaling is an appealing tool since it is computationally inexpensive. Thus, it has been widely used over the last three decades. In recent years, super-resolution with deep learning has been successfully applied to generate high-resolution from low-resolution images in the computer vision domain. This task is somewhat analogous to downscaling in the meteorological domain.

Inspired by this, we explore the use of deep neural networks with a super-resolution approach for statistical precipitation downscaling. We apply the Swin transformer architecture (SwinIR) as well as convolutional neural network (U-Net) with a Generative Adversarial Network (GAN) and a diffusion component for probabilistic downscaling. We use short-range forecasts from the Integrated Forecast System (IFS) on a regular spherical grid with ΔxIFS=0.1° and map to the high-resolution observation radar data RADKLIM (ΔxRK=0.01°). The neural networks are fed with nine static and dynamic predictors, similar to the study by Harris et al., 2022. All the models are comprehensively evaluated by grid point-level errors as well as error metrics for spatial variability and the generated probability distribution. Our results demonstrate that the Swin Transformer model can improve accuracy with lower computation cost compared to the U-Net architecture.  The GAN and diffusion models both further help the model to capture the strong spatial variability from the observed data.   Our results encourage further development of DNNs that can be potentially leveraged to downscale other challenging Earth system data, such as cloud cover or wind. 

How to cite: Gong, B., Ji, Y., Langguth, M., and Schultz, M.: Statistical downscaling of precipitation with deep neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10488, https://doi.org/10.5194/egusphere-egu23-10488, 2023.

EGU23-11489 | Posters on site | ESSI1.5

Towards a benchmark dataset for statistical downscaling of meteorological fields 

Michael Langguth, Bing Gong, Yan Ji, Martin G. Schultz, and Olaf Stein

The representation of the atmospheric state at high spatial resolution is of particular relevance in various domains of Earth science. While global reanalysis datasets such as ERA5 provide comprehensive repositories of meteorological data, their spatial resolution (∆x≥25 km) is too coarse to capture relevant local features, mainly over complex terrain (e.g. cold pools in valleys, low-level jets, local heavy precipitation events).
Recently, various studies have started to apply deep neural networks adapted from computer vision to increase the spatial resolution of meteorological fields. Although these studies reveal great potential in the domain of statistical downscaling, intercomparison of the approaches is impeded due to a large variety of methods and deployed datasets. Comparisons to classical downscaling methods developed for decades in the meteorological community are also often underrepresented.

Inspired by the available benchmark datasets for various computer vision tasks and for weather forecasting (e.g. WeatherBench and WeatherBench Probability), our study aims to provide a benchmark dataset for statistical downscaling of meteorological fields. We choose the coarse-grained ERA5 reanalysis (∆xERA5≃30 km) and the fine-scaled COSMO-REA6 (∆xCREA6≃6km) as input and target datasets. Both datasets enable the formulation of a real downscaling task: super-resolve the data and correct for model biases.
The benchmark dataset provides a collection of predictors and predictands for a couple of standard downscaling tasks. These comprise downscaling of the 2m temperature, the surface irradiance, the near-surface wind field and precipitation. Along with the dataset, benchmark deep neural networks, namely variants of U-Nets and GANs, will be provided. Well-chosen sets of evaluation metrics including baseline scores of the benchmarked deep neural networks are presented to enable comparison between different methods.
The envisioned benchmark dataset will provide a comprehensive basis for comparing neural network approaches on statistical downscaling of meteorological fields. This, in turn, is considered to enhance confidence and transparency in the application of deep learning methods on Earth system problems.

How to cite: Langguth, M., Gong, B., Ji, Y., Schultz, M. G., and Stein, O.: Towards a benchmark dataset for statistical downscaling of meteorological fields, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11489, https://doi.org/10.5194/egusphere-egu23-11489, 2023.

EGU23-12333 | ECS | Posters virtual | ESSI1.5

An AI hybrid predictive tool for extreme hurricane forecasting 

Javier Martinez Amaya, Cristina Radin, Veronica Nieves, Nicolas Longépé, and Jordi Muñoz-Marí

Hurricanes, and more generally tropical cyclones, are among the most destructive natural hazards, and are arguably changing under climate change influences. Applying the power of AI to predict the extreme behavior of these events could be key to helping minimize hurricane damage. AI tools are a significant opportunity to: 1) identify non-linear relationships between changing hurricane-related characteristics and tropical storm intensification, and 2) anticipate responses to these changes. Another key part of this AI-based system is uncertainty quantification for decision-making processes. In this context, we present an improved ML hybrid model for predicting the development of extreme hurricane events, which includes effective information on spatio-temporal evolution variations of structural parameters extracted from IR satellite images. This approach, which combines Convolutional Neural Networks (CNNs) and a Random Forest (RF) classification framework, has been trained/tested with data from 1995 over the North Atlantic and NorthEast Pacific regions. Results from the CNN-RF model shows a performance of 80% or better considering lead-times of up to three days ahead (every 6 hours). With the proposed configuration, the overall precision has increased by at least 8%. This model could be yet further improved with the inclusion of new variables linked to environmental factors to be progressively explored. 

How to cite: Martinez Amaya, J., Radin, C., Nieves, V., Longépé, N., and Muñoz-Marí, J.: An AI hybrid predictive tool for extreme hurricane forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12333, https://doi.org/10.5194/egusphere-egu23-12333, 2023.

EGU23-13106 | Posters virtual | ESSI1.5

An Unsupervised Anomaly Detection Problem in Urban InSAR-PSP Long Time-series 

Ridvan Kuzu, Yi Wang, Octavian Dumitru, Leonardo Bagaglini, Giorgio Pasquali, Filippo Santarelli, Francesco Trillo, Sudipan Saha, and Xiao Xiang Zhu

Interferometric Synthetic Aperture Radar satellite measurements are an effective tool for monitoring ground motion with millimetric resolution over long periods of time. The Persistent Scatterer Pair method, developed in [1], is particularly useful for detecting differential displacements of buildings at multiple positions with few assumptions about the background environment. As a result, anomalous behaviours in building motion can be detected through PSP time series, which are commonly used to perform risk assessments in hazardous areas and diagnostic analyses after damage or collapse events. However, current autonomous early warning systems based on PSP-InSAR data are limited to detecting changes in linear trends and rely on sinusoidal and polynomial models [2]. This can be problematic if background signals exhibit more complex behaviours, as anomalous displacements may be difficult to identify. To address this issue, we propose an unsupervised anomaly detection method using Artificial Intelligence algorithms to identify potentially anomalous building motions based on PSP long time-series data.

To identify anomalous building motions, we applied two different AI algorithms based on Long Short-Term Memory Autoencoder inspired by [3] and a Graph Neural Network version of it. LSTM Autoencoder is an unsupervised representation learning framework that captures data representations by reconstructing the correct order of shuffled time series. Its encoder part is used to extract feature representations of a time series, while the decoder part is used to reconstruct the time series. By assuming that most stable samples exhibit similar temporal changes, this algorithm can be used for anomaly detection (as the reconstruction loss would be high for anomalous time series).

The data used in this study were provided by the European Ground Motion Service over a rectangular area surrounding the city of Rome and includes approximately 500.000 time-series aggregated over more than 80.000 buildings. The time period covered is from 2015 to 2020.

In our proposed approach, we first extract deep feature representations for each timestamp of a non-anomalous time series. The feature sequence is then shuffled and passed through an LSTM encoder-decoder network. By learning to reconstruct the feature sequence with the correct order, the network is able to recognize high-level representations of the time series. In the second step, the pre-trained network is used to reconstruct another time series. If the time series is non-anomalous, the correct order can be reconstructed with high confidence; otherwise, it is difficult to reconstruct the correct order. By selecting an appropriate threshold, anomalies can be detected with high reconstruction losses.

Overall, our proposed AI-based approach shows promising results for identifying anomalous building motions in PSP long time-series data. The use of unsupervised learning allows for more accurate statistical representations of the data and more reliable detection of anomalous behaviours. This approach has the potential to improve autonomous early warning systems for risk assessments and diagnostic analyses in dangerous areas.

This work is part of the RepreSent project funded by the European Space Agency (NO:4000137253/22/I-DT).

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4779025
[2] https://www.mdpi.com/2072-4292/10/11/1816/pdf
[3] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9307226

How to cite: Kuzu, R., Wang, Y., Dumitru, O., Bagaglini, L., Pasquali, G., Santarelli, F., Trillo, F., Saha, S., and Zhu, X. X.: An Unsupervised Anomaly Detection Problem in Urban InSAR-PSP Long Time-series, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13106, https://doi.org/10.5194/egusphere-egu23-13106, 2023.

EGU23-13921 | Orals | ESSI1.5 | Highlight

Towards a local, dated and thematic digital twins factory 

Jean-Marc Delvit, Pierr-Marie Brunet, Pierre Lassalle, Dimitri Lallement, and Simon Baillarin

The notion of digital twin can be ambiguous because it can be defined in various ways. These last months have seen the emergence of many global digital twin initiatives. The challenge of these global digital twins is to create a qualified digital replica model of our planet, making it possible to monitor, simulate and anticipate natural phenomena and human activities. The target users are either scientists or decision makers. Through the digital twin, they have access to a digital representation of an environment using all available spatial and non-spatial data accompanied with a set of physical and statistical models to calculate projections, replay past events or simulate future ones.

 

Refining and evaluating the accuracy of these projections is a major challenge for digital twins. In addition to the knowledge of physical modeling, suitable data must also be available. Complementary to the global approach, the notion of local and dated digital twins appears then to be essential. Considering a digital representation of a restricted geographical area of interest (an urban area, watershed, coastline, etc.) allows to access to very high-resolution "fresh" data in 2D and 3D, in-situ data and small-mesh physical model. This user-centered and naturally thematic approach responds more finely and more pragmatically to the objectives presented. These local, dated and thematic digital twins are by essence ephemeral: a way to meet a specific need.

 

The challenge is therefore to setup a Digital Twin Factory (DTF). This DTF relies on a data lake, a high computing capacity via clouds and/or HPC and has thematic algorithms and methodologies able to generate registered and coherent layers of information in order to enrich a datacube from which physical indicators can be computed spatially. Thanks to its thematic, local and on-demand characteristics, the DTF can mitigate the need to have an universal model of metadata. This datacube allows to apply local physical and artificial intelligence models. The overarching architecture of the DTF will be presented. Specific examples on coastal, urban and risk topics will also be presented. These digital twins rely on a large number of expertises in both data and modeling involving various French (CNES, IGN, SHOM, IRD, CEA, INRAE, METEOFRANCE, CERFACS, BRGM, etc.) or international organizations (ESA, NASA, NOAA,…).

For coastal areas, the goal is to well describe the bathymetry topography continuum by taking into account the intertidal zones and the specialized dynamic models together with 3D coastal land cover characterisation. For urban areas, the ambition is first to automatically produce a qualified 3D map together with its additional layers of information: 3D objects and related semantics (land cover and land use) including temporal dynamic, thermal information. Then, for issues related to the management of natural risks (such as floods or fires) similar data layers can be used. Finally, new hypothesis can be injected in these digital replica and multiple scenarios can be applied to assess causal relationship between hypothesis and prediction. Very promising results will also be presented.

How to cite: Delvit, J.-M., Brunet, P.-M., Lassalle, P., Lallement, D., and Baillarin, S.: Towards a local, dated and thematic digital twins factory, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13921, https://doi.org/10.5194/egusphere-egu23-13921, 2023.

EGU23-14998 | Posters on site | ESSI1.5

Digital twincubator eWaterCycle 

Niels Drost, Peter Kalverla, Bart Schilperoort, Barbara Vreede, Sarah Alidoost, Stefan Verhoeven, Yang Liu, and Rolf Hut

Recently there’s been a lot of enthousiasm for the concepts of digital twins, virtual research environments, serious games, and other inspiring ideas to improve “the way we do science.” With eWaterCycle, we are no stranger to the cause. We’ve worked hard to build a platform that could make the scientific process – specifically, hydrological modelling – more accessible and engaging.  

eWaterCycle gives users access to a centralized platform where they can perform hydrological experiments: simulating how water flows through a catchment area of choice. Complete with data, a suite of models, an interactive scripting environment, and a graphical explorer to quickly setup an experiment. It shares many characteristics with what is commonly understood of a digital twin. But is it, really?  

Sadly, the concept of digital twins suffers from linguistic inflation. At a recent event on the topic, the main coffee chatter was along the lines of “but what actually is it?” In an arena filled with resonating buzz, a clear image can help to regain focus and a common frame of reference. Is eWaterCycle, as a platform that supports working with each other’s models and data, a digital twin? Or is it more an incubator of digital twins? Either way, eWaterCycle can help make things concrete and specific, because it already exists.  

With several new projects promising to build digital twins of all sorts, we hope our experience can feed into the discussions on and development of new digital twins. Therefore, at EGU, we would like to reflect on the essence of our platform and our experience in building it. What is it (not)? What’s in it for you? What challenges did we face? And what does that mean for open science and collaborative research? 

How to cite: Drost, N., Kalverla, P., Schilperoort, B., Vreede, B., Alidoost, S., Verhoeven, S., Liu, Y., and Hut, R.: Digital twincubator eWaterCycle, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14998, https://doi.org/10.5194/egusphere-egu23-14998, 2023.

EGU23-15688 | Posters virtual | ESSI1.5

Digital Twins of the Ocean – Opportunities to Inform Sustainable Ocean Governance 

Joana Kollert, Martin Visbeck, and Ute Brönner

Recent advances in High Performance Computing and Earth System Model resolution have enabled the Earth Science community to envision Digital Twins as an innovative approach to global environmental problems. This is also true of the Ocean Science community.

A Digital Twin of the Ocean (DTO) merges marine system models with observational data and machine learning analytics to produce a digital replica of the real ocean. In addition to natural phenomena, DTOs can include socio-economic factors (e.g. ocean-use, pollution). Thus, DTOs can be used to monitor the current ocean state, but also to simulate future ‘what-if’ scenarios for varying human interventions. Another benefit of DTOs is that they can be used by a variety of stakeholders: by scientists to understand the ocean, by policymakers to make well-informed decisions, and by citizens to improve ocean literacy. As such, DTOs are a powerful tool in future-proofing sustainable development. Moreover, they provide strong motivation to improve the marine data landscape and build an interoperable system with agreed upon formats and standards. DTOs are tailored to a specific ocean area or purpose, such that a DTO framework is needed to implement data connectivity and interoperability, ease of access, standards and to highlight gaps. The UN Ocean Decade Program DITTO aims to provide such a framework.  Specifically, DITTO advances worldwide collaboration between scientists, data and IT experts to develop a common understanding of DTOs, to establish best practices in their development, and to advance a digital framework for DTOs to empower ocean professionals from all sectors around the world to effectively create their own digital twins.

DTOs offer the technology for building a social-ecologically integrated ocean ecosystem with observation- and modelling networks that support sustainable ocean governance. 

How to cite: Kollert, J., Visbeck, M., and Brönner, U.: Digital Twins of the Ocean – Opportunities to Inform Sustainable Ocean Governance, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15688, https://doi.org/10.5194/egusphere-egu23-15688, 2023.

Reservoir simulations often require statistical predictions to quantify production uncertainty or assess potential risks. Most existing uncertainty quantification procedures aim to decompose the input random field into independent random variables if the correlation scale is small compared to the domain size. In this work, we develop a K-means-based aggregation model, for efficiently estimating multiphase flow performance in multiple geological realizations. This approach performs a number of single-phase flow simulations and uses K-means clustering to select only a few representatives on which multiphase flow simulations are performed. In addition, an empirical model is then employed to describe the relationship between the single-phase solution and the multiphase solution using these representatives. Finally, the multiphase solution in all realizations can be easily predicted using empirical models. The method is applicable to both 2D and 3D synthetic models and has been shown to perform well in the trusted interval of productivity, and probability distribution as indicated by the cumulative density function. It is able to capture a large number of ensemble statistical realizations of Monte Carlo simulation results with significantly reduced computational cost.

How to cite: Liao, Q.: Clustering aggregation model for statistical forecasting of multiphase flow problems, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-973, https://doi.org/10.5194/egusphere-egu23-973, 2023.

EGU23-3891 | Posters on site | HS3.5

Estimating groundwater response time in humid climate by using spectral analysis 

Mariaines Di Dato, Timo Houben, and Sabine Attinger

During dry periods, river flow comprises baseflow, which typically generates from shallow aquifers. Understanding how such aquifers respond to climate events is key to managing environmental issues related to water supply or water quality. A typical indicator of groundwater response to climate events is the characteristic response time, which indicates the rate of depletion of shallow aquifers.

The traditional method to infer the characteristic response time analyzes the slope of the hydrograph recession curve. Such a method does not account for stormwater contribution in recession analysis, thereby assuming that the catchment is dry and the only contribution to discharge originates from groundwater. As a consequence, the recession analysis might underestimate the groundwater response time, owing to the presence of faster discharge components, i.e. surface runoff or interflow, in the falling limbs.

In this work, we propose an alternative methodology to calculate the characteristic response time, which is determined by analyzing the behavior of the baseflow time series in the frequency domain. The aquifer can be conceptualized as a low-pass filter, which smooths the high-fluctuating components in the recharge signal. Such behavior causes a cut-off frequency in the baseflow spectrum, which corresponds to the aquifer characteristic time. We applied this approach to several gauging stations in Germany, whose humid climate is ideal to compare the results with the classical recession analysis.

We observed that spectral analysis yields characteristic response times systematically larger than the ones calculated with recession analysis. On average there is a factor of two between the estimates provided by the two methods. Overall our study emphasizes careful consideration of the estimation of groundwater response times, especially in humid and sub-humid river basins.

How to cite: Di Dato, M., Houben, T., and Attinger, S.: Estimating groundwater response time in humid climate by using spectral analysis, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3891, https://doi.org/10.5194/egusphere-egu23-3891, 2023.

EGU23-3933 | Posters on site | HS3.5

Towards identification of dominant hydrological mechanisms in ungauged catchments 

Cristina Prieto, Le Vine Nataliya, Kavetski Dmitri, Fenicia Fabrizio, Scheidegger Andreas, and Vitolo Claudia

Modelling hydrological processes in ungauged catchments is a major challenge in environmental sciences and engineering. An ungauged catchment is a catchment that lacks streamflow data suitable for traditional modelling methods. Predicting streamflow in ungauged catchments requires some form of extrapolation ("regionalisation") from other "similar" catchments, with variables of interest being flow "indices" or "signatures", such as quantiles of the flow duration curve, etc.

Another major question in hydrology is the estimation of model structure that reflects the hydrological processes relevant to the catchment of interest. This question is intimately tied to process representation. To paraphrase a common saying, all models are wrong, but some model mechanisms (process representations) might be useful. Our previous study contributed a Bayesian framework for the identification of individual model mechanisms from streamflow data.

In this study we extend the mechanism identification method to operate in ungauged basins based on regionalized flow indices. Candidate mechanisms and model structures are generated, and then the "dominant" (more a posterior probable) model mechanisms are identified using statistical hypothesis testing. As part of the derivation, it is assumed that the error in the regionalization of flow indices dominates the structural error of the hydrological model.

The proposed method is illustrated with real data and synthetic experiments based on 92 catchments from northern Spain, from which 16 catchments are treated as ungauged. We use 624 model structures from the flexible hydrological model framework FUSE. Flow indices are regionalised using random forest regression in principal component (PC) space; we select the first 4 leading indices in PC space. The case study set up includes an experiments using real data (where the true mechanisms are unknown) and a set of synthetic experiments with different error levels (where the “true” mechanisms are known).

Across the real and synthetic experiments, routing is usually among the most identifiable processes, whereas the least identifiable processes are percolation and unsaturated zone processes. The precision, i.e. the probability of making an identification (whether correct or not), remains stable at around 25%. In the synthetic experiments we can calculate the (conditional) reliability of the identification method, i.e. the probability that, when the method makes an identification, the true mechanism is identified. The conditional reliability varies from 60% to 95% depending on the magnitude of the combined regionalization and hydrological error. Our study contributes perspectives on hydrological mechanism identification under data-scarce conditions; we discus limitations and opportunities for improvement.

 

Prieto, C., N. Le Vine, D. Kavetski, F. Fenicia, A. Scheidegger, and C. Vitolo (2022) An Exploration of Bayesian Identification of Dominant Hydrological Mechanisms in Ungauged Catchments, Water Resources Research, 58(3), e2021WR030705, doi: https://doi.org/10.1029/2021WR030705.

How to cite: Prieto, C., Nataliya, L. V., Dmitri, K., Fabrizio, F., Andreas, S., and Claudia, V.: Towards identification of dominant hydrological mechanisms in ungauged catchments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3933, https://doi.org/10.5194/egusphere-egu23-3933, 2023.

EGU23-5100 | ECS | Orals | HS3.5

Characterising errors using satellite metadata for eco-hydrological modelling 

Hui Zou, Lucy Marshall, and Ashish Sharma

Understanding the origin of errors in model predictions is a critical element in hydrologic model calibration and uncertainty estimation. While there exist a variety of plausible error sources, only one measure of the total residual error can be ascertained when the observed response is known. Here we show that collecting extra information a priori to characterise the data error before calibration can assist in improved model calibration and uncertainty estimation. A new model calibration strategy using the satellite metadata information is proposed as a means to inform the model prior, and subsequently to decompose data error from total residual error. This approach, referred to as Bayesian ecohydrological error model (BEEM), is first examined in a synthetic setting to establish its validity, and then applied to three real catchments across Australia. Results show that 1) BEEM is valid in a synthetic setting, as it can perfectly ascertain the true underlying error; 2) in real catchments the model error is reduced when utilizing the observation error variance as added error contributing to total error variance, while the magnitude of total residual error is more robust when utilizing metadata about the data quality proportionality as the basis for assigning total error variance ; 3) BEEM improves model calibration by estimating the model error appropriately and estimating the uncertainty interval more precisely. Overall, our work demonstrates a new approach to collect prior error information in satellite metadata and reveals the potential for fully utilizing metadata about error sources in uncertainty estimation.

How to cite: Zou, H., Marshall, L., and Sharma, A.: Characterising errors using satellite metadata for eco-hydrological modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5100, https://doi.org/10.5194/egusphere-egu23-5100, 2023.

EGU23-5635 | ECS | Orals | HS3.5

Spectral analysis of groundwater level time series reveals hydrogeological parameters 

Timo Houben, Mariaines Di-Dato, Christian Siebert, Thomas Kalbacher, Thomas Fischer, and Sabine Attinger

Groundwater resources are heavily exploited to supply domestic, industrial and agricultural water consumption. Climate and societal changes and associated higher abstraction will alter the subsurface storage in terms of quantity and quality in currently unpredictable ways. In order to ensure sustainable groundwater management, we must evaluate the intrinsic and spatially variable vulnerability of aquifers in terms of water quality issues and the resilience of groundwater volumes to external perturbations such as severe droughts in connection with intensive irrigation. For this purpose, physically based numerical groundwater models are of great importance, especially on the regional scale. The equations applied in these models must be fed with the hydrogeological parameters: The transmissivity T and the storativity S.

Both parameters are typically obtained through time consuming and cost intensive hydrogeological in-situ tests or by laboratory analysis of core samples from point information (drillings and wells), resulting in parameters with limited transferability to regional settings. Instead, we propose to determine the parameters by spectral analysis of groundwater level fluctuations using (semi-)analytical solutions for the frequency domain. We developed a fully automatized workflow, taking groundwater level and recharge time series together with little information about the geometry of the aquifer to derive T and S as well as tc (the characteristic response time). While the first two will be used for hydrogeological modelling, the latter can serve as an indication to assess the resilience of the groundwater system directly without additional modelling attempts. The methodology was tested with great success in simplified numerical environments and was applied to real groundwater time series in southern Germany. The response times and the storativities could be robustly estimated while the transmissivities inherit quantifiable uncertainties. Depending on the hydrogeological regime, the parameters represented effective and regional estimates.

How to cite: Houben, T., Di-Dato, M., Siebert, C., Kalbacher, T., Fischer, T., and Attinger, S.: Spectral analysis of groundwater level time series reveals hydrogeological parameters, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5635, https://doi.org/10.5194/egusphere-egu23-5635, 2023.

EGU23-6986 | Orals | HS3.5

On the elaboration of a robust calibration strategy for the large-scale GEM-Hydro model 

Etienne Gaborit, Daniel Princz, Juliane Mai, Hongren Shen, Bryan Tolson, and Vincent Fortin

As part of the Great-Lakes Runoff Inter-comparison Project (GRIP-GL; Mai et al., 2022), which aims at comparing the performances of different hydrologic models over the Great-Lakes when calibrating them using the same meteorological inputs and geophysical databases, the GEM-Hydro hydrologic model used at Environment and Climate Change Canada (ECCC) to perform operational hydrologic forecasts was calibrated using different strategies. Following the calibration work related to GRIP-GL, progress has been achieved with regard to improving the calibration of the GEM-Hydro model.

The work presented here focuses on improvements achieved with regard to calibrating the GEM-Hydro model, compared to the default version of the model and to the performances obtained during the GRIP-GL project. For various reasons explained, the GEM-Hydro calibration performed as part of GRIP-GL was suboptimal. The general calibration framework remains the same as in GRIP-GL, for example by using the MESH-SVS-Raven model to speed-up simulation times and transferring the calibrated parameters into GEM-Hydro afterwards, by relying on global calibrations for each of the 6 Great-Lakes subdomains, etc. However, several important changes have been made compared to the work performed in GRIP-GL, like a new approach to represent the effect of Tile Drains, changing the set of flow stations used for calibration, revising the objective function, etc.

The proposed calibration methodology updates significantly improve GEM-Hydro streamflow performance across the Great-Lakes domain and in addition also improve or maintain similar performance levels as the default version of the model, with respect to auxiliary variables and surface fluxes: snow, soil moisture, evapotranspiration, 2m air temperature and dew point. Indeed, the model relies on 40m atmospheric forcings for wind speed, temperature and humidity, and simulates its own 2m atmospheric variables. To achieve this, it was necessary to constrain some parameter interval values during calibration, in order to prevent the calibration algorithm to choose physically-irrelevant parameter values that could allow to improve streamflow performances while degrading other hydrologic variables, due to equifinality.

Reference:

Mai, J., Shen, H., Tolson, B. A., Gaborit, E., Arsenault, R., Craig, J. R., Fortin, V., Fry, L. M., Gauch, M., Klotz, D., Kratzert, F., O'Brien, N., Princz, D. G., Rasiya Koya, S., Roy, T., Seglenieks, F., Shrestha, N. K., Temgoua, A. G. T., Vionnet, V., and Waddell, J. W. (2022). The Great Lakes Runoff Intercomparison Project Phase 4: The Great Lakes (GRIP-GL). Hydrol. Earth Syst. Sci., 26, 3537–3572. Highlight paper. Accepted Jun 10, 2022.  https://doi.org/10.5194/hess-26-3537-2022

How to cite: Gaborit, E., Princz, D., Mai, J., Shen, H., Tolson, B., and Fortin, V.: On the elaboration of a robust calibration strategy for the large-scale GEM-Hydro model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6986, https://doi.org/10.5194/egusphere-egu23-6986, 2023.

EGU23-8423 | Orals | HS3.5

Time-varying sensitivity analysis across different hydrological model structures, variables and time scales 

Björn Guse, Anna Herzog, Stephan Thober, Diana Spieler, Lieke Melsen, Jens Kiesel, Maria Staudinger, Paul Wagner, Ralf Loritz, Sebastian Müller, Michael Stölzle, Larissa Scholz, Justine Berg, Tobias Pilz, Uwe Ehret, Doris Düthmann, Tobias Houska, Sandra Pool, and Larisa Tarasova and the other members of the DFG Scientific network IMPRO

Temporal sensitivity analyses can be used to detect dominant model parameters at different time steps (e.g. daily or monthly) providing insights on their temporal patterns and reflecting the temporal variability in dominant hydrological processes. However, hydrological processes do not only vary in time under different hydrometeorological conditions, but also the time scales of implemented processes are different. Here, the impact of different time scales (e.g. daily vs. monthly) on sensitivity patterns is investigated.

A temporal parameter sensitivity analysis is applied to three hydrological models (HBV, mHM and SWAT) for nine catchments in Germany. These catchments represent the variability of landscapes in Germany and are dominated by different runoff generation processes. In addition to discharge, further model fluxes and states such as evapotranspiration or soil moisture are used as target variables for the sensitivity analysis.

To analyse the impact of different time scales, two approaches are compared. In a first approach, daily simulated time series are used for the sensitivity analysis and aggregated then to monthly averaged sensitivities (Post-Agg). In a second approach, the simulated time series is first aggregated to a monthly time series and than used as input for the sensitivity analysis (Pre-Agg).

Our analysis shows that monthly averaged sensitivity patterns of different model outputs vary between Post- and Pre-Aggregation approach. Model parameters that are related to fast-reacting runoff processes, e.g. surface runoff or fast subsurface flow, are more sensitive when using daily time series for the sensitivity analysis (Post-Agg). In contrast, model parameters related processes with longer time scales such as snowmelt or evapotranspiration are more emphasized in monthly time series (Pre-Agg). These differences in the sensitivity results between Post-Agg and Pre-Agg are in particularly pronounced when using the integrated value of discharge as the target variable. Instead, the differences are smaller when applying the sensitivity analysis directly to represent model fluxes.

Moreover, our analysis shows changes in dominant parameters along a north-south gradient which can be explained by the physiographic characteristics of the catchments. The differences in the sensitivity results between the models can be related to the different model structures.

Based on our analysis, we recommend to either using model outputs of the major hydrological variables or different time scales for the sensitivity analysis to derive the maximum information from the diagnostic model analysis and to understand how model parameters describe hydrological systems.

How to cite: Guse, B., Herzog, A., Thober, S., Spieler, D., Melsen, L., Kiesel, J., Staudinger, M., Wagner, P., Loritz, R., Müller, S., Stölzle, M., Scholz, L., Berg, J., Pilz, T., Ehret, U., Düthmann, D., Houska, T., Pool, S., and Tarasova, L. and the other members of the DFG Scientific network IMPRO: Time-varying sensitivity analysis across different hydrological model structures, variables and time scales, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8423, https://doi.org/10.5194/egusphere-egu23-8423, 2023.

EGU23-10001 | Posters on site | HS3.5

Investigating the spectral analysis of groundwater level fluctuations in a numerical model of the upper Danube catchment in Germany 

Rao Ali Javed, Timo Houben, Thomas Kalbacher, and Sabine Attinger

Common in-situ methods like pumping tests, slug tests and laboratory analysis reveal aquifer parameters (that is the transmissivity and storativity) that are localized and specific to the measurement location. A need for regionally valid aquifer parameters arises when setting up regional scale physically based groundwater models. The models would help water resource managers to plan and predict the quality and quantity of groundwater resources, thus supports decision making as well as sustainable fresh water supply. A study from Houben et al. 2022 indicate that regional aquifer parameters can be obtained by analysing the frequency content of groundwater level time-series. Their work builds upon a semi-analytical solution for the groundwater head spectrum stochastically derived from the Boussinesq equation evoking the Dupuit assumptions. They found that the solution can be used to infer the transmissivity and storativity from groundwater level fluctuations and validated their hypothesis in simplified numerical environments of different complexity.

In this work, we extended the numerical experiments and applied the semi-analytical solution in homogeneous and heterogeneous 2D (x-y-plane) aquifers as well as in a complex numerical 2D (x-y-plane) model of the upper Danube catchment. We tested the hypothesis that certain locations can reveal regional aquifer parameters. In a homogeneous simulated model, the semi-analytical solution reveals effectively the model input parameters which serves as a proof-of-concept. In a heterogeneous numerical model, the obtained parameters show the complex interplay between zones of different permeability. The effects of high permeable zones can be observed on the low permeable zones which are further apart and vice versa. The obtained parameters were in the range of the model input parameters and followed the trend of the input parameters along the direction of flow. In the model of the upper Danube the obtained parameters were systematically larger than the input parameters. The shift in the obtained parameters was attributed to a violation of the assumptions of the semi-analytical solution. Thus, the complexity of model leads to a breakdown of the semi-analytical solution in some areas. Analyses on a sub-catchment scale revealed that when the assumptions of the analytical solution are met, the obtained parameters reflect the effective parameters.

How to cite: Javed, R. A., Houben, T., Kalbacher, T., and Attinger, S.: Investigating the spectral analysis of groundwater level fluctuations in a numerical model of the upper Danube catchment in Germany, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10001, https://doi.org/10.5194/egusphere-egu23-10001, 2023.

 

Hydrologic models often are used to estimate streamflows at ungauged locations for infrastructure planning. These models can contain a multitude of parameters that themselves need to be estimated through calibration. Yet multiple sets of parameter values may perform nearly equally well in simulating flows at gauged sites, making these parameters highly uncertain. Markov Chain Monte Carlo (MCMC) algorithms can quantify parameter uncertainties; however, this can be computationally expensive for hydrological models. Thus, it is important to select an MCMC algorithm that is effective (converges to the true posterior parameter distribution), efficient (fast), reliable (consistent across random seeds) and controllable (insensitive to the algorithms hyperparameters). These characteristics can be assessed through algorithm diagnostics, but current MCMC diagnostics mostly focus on evaluating convergence of an individual search process, not diagnosing general problems of the algorithms. Therefore, additional diagnostics are required to represent algorithms sensitivity to their hyperparameters and to compare their performance across problems.

Here, we propose new diagnostics to assess the effectiveness, efficiency, reliability and controllability of four MCMC algorithms: Adaptive Metropolis, Sequential Monte Carlo, Hamiltonian Monte Carlo, and DREAM(ZS). The diagnostic method builds off of diagnostics used to assess the performance of Multi-Objective Evolutionary Algorithms (MOEAs), and allows us to evaluate the sensitivity of the algorithms to their hyper-parameterization and compare their performance on multiple metrics, such as the Gelman-Rubin diagnostic and Wasserstein distance from the true posterior. We illustrate our diagnostics using the simple Hydrological Model (HYMOD) and several analytical test problems. This allows us to see which algorithms perform well on problems with different characteristics (e.g. known vs. unknown posterior shapes, uni- vs. multi-modality, low- vs. high-dimensionality). Since posterior shapes and modality are often unknown for hydrological problems, it is important to calibrate them with an MCMC algorithm that is robust across a wide variety of posterior shapes, and our new diagnostics allow for this identification.

How to cite: Kavianihamedani, H., Quinn, J., and Smith, J.: New Diagnostic Assessment of MCMC Algorithms Effectiveness, Efficiency, Reliability, and Controllability in Calibrating Hydrological Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10326, https://doi.org/10.5194/egusphere-egu23-10326, 2023.

EGU23-10510 | ECS | Posters on site | HS3.5

Uncertainty Quantification in Hydrological and Environmental Modeling based on Polynomial Chaos Expansion 

Zoe Li, Pengxiao Zhou, and Maysara Ghaith

There are significant uncertainties associated with the estimates of model parameters in hydrological and environmental modeling. Such uncertainties could propagate within a modeling framework, leading to considerable deviation of the predicted value from its real value. Quantifying the uncertainties associated with model parameters could be computationally exhaustive and is still a daunting challenge to hydrological and environmental engineers. In this study, a series of Polynomial Chaos Expansion (PCE) methods, which have a significant advantage in computational efficiency, is developed to assess the propagation of parameter uncertainty. The proposed approaches were applied to two hydrological/environmental modeling case studies. The uncertainty quantification results will be compared with those from the traditional Monte Carlo simulation technique, to demonstrate the effectiveness and efficiency of the proposed approaches. This work will provide an efficient and reliable alternative to assess the impacts of the parameter uncertainties in hydrological and environmental modeling.

How to cite: Li, Z., Zhou, P., and Ghaith, M.: Uncertainty Quantification in Hydrological and Environmental Modeling based on Polynomial Chaos Expansion, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10510, https://doi.org/10.5194/egusphere-egu23-10510, 2023.

EGU23-10644 | Orals | HS3.5

Impact of Model Parameters on Runoff Sensitivities in the Community Land Model: A Study on the Upper Colorado River Basin 

Yadu Pokhrel, Ahmed Elkouk, Lifeng Luo, Liz Payton, Ben Livneh, and Yifan Cheng

Understanding how land surface models (LSMs) partition precipitation into evapotranspiration and runoff under changing climate is key to improved future hydrologic predictions. This sensitivity is rarely tuned in land models, as evidenced by prevalent biases in the sensitivity of simulated runoff to precipitation and temperature change compared to observational estimates. Here, using the Community Land Model (CLM5) over the Colorado River basin (CRB), we investigate what the informative model parameters for runoff sensitivities are and how their choices affect the sensitivities under changing temperature and precipitation. We focus on the headwater region of the CRB, motivated by inconsistent model estimates of runoff sensitivities in the region and the critical need to better understand runoff changes to address the ongoing water crises in the CRB. In each headwater basin, a set of informative parameters were identified through parameter perturbations using “one at a time” method within an adaptive surrogate-based model optimization scheme (ASMO). Results of perturbations highlight that different parameter sets with similar performance (with respect to water-year discharge) provide very different runoff sensitivities to temperature and precipitation during the 1951-2010 period. Additionally, both precipitation and temperature sensitivities of runoff show sensitivity to similar parameters across the region. The most sensitive parameters control the conductance-photosynthesis relationship, soil surface resistance for direct evaporation, the partitioning of runoff into the surface and the subsurface component, and soil hydraulic properties. We show how the importance of each parameter varies through the parameter space and derive parameter estimates by maximizing the “fit to observed sensitivities” within the ASMO scheme. Our results provide key insights regarding parameters optimization to improve long-term hydrologic sensitivities in LSMs.

How to cite: Pokhrel, Y., Elkouk, A., Luo, L., Payton, L., Livneh, B., and Cheng, Y.: Impact of Model Parameters on Runoff Sensitivities in the Community Land Model: A Study on the Upper Colorado River Basin, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10644, https://doi.org/10.5194/egusphere-egu23-10644, 2023.

EGU23-11129 | ECS | Orals | HS3.5

Pitfalls and Opportunities in the Use of Markov-Chain Monte Carlo Ensemble Samplers for Vadose Zone Model Calibration 

Giuseppe Brunetti, Jiri Simunek, Thomas Wöhling, and Christine Stumpp

Bayesian inference has become the most popular approach to uncertainty assessment in vadose zone hydrological modeling. By combining prior information with observations and model predictions, it became popular among hydrologists as it enables them to infer parameter posterior distributions, verify model adequacy, and assess the model's predictive uncertainty. In particular, the posterior distribution is frequently the variable of interest for modelers as it describes the epistemic uncertainty of model parameters conditioned on measurements. Gradient-free Markov-Chain Monte Carlo (MCMC) ensemble samplers based on Differential Evolution (DE) or Affine Invariant (AI) strategies have been used to approximate the posterior distribution, which is frequently anisotropic and correlated in vadose zone-related problems. However, a rigorous benchmark of different MCMC algorithms to provide guidelines for their application in vadose zone hydrological model calibration is still missing. In this study, we elucidate the behavior of MCMC ensemble samplers by performing an in-depth comparison of four samplers that use AI moves or DE-based strategies to approximate the target density. Two Rosenbrock distributions, and one synthetic and one actual case study focusing on the inverse estimation of soil hydraulic parameters using HYDRUS-1D, are used to compare algorithms in different dimensions. The analysis reveals that AI-based samplers are immune to affine transformations of the target density, which instead double the autocorrelation time for DE-based samplers. This behavior is reiterated in the synthetic scenario, for which AI-based algorithms outperform DE-based strategies. However, this performance gain disappears when the number of soil parameters increases from 7 to 16, with both samplers exhibiting poor acceptance rates, which are not improved by increasing the number of chains from 50 to 200 or by mixing different strategies.

How to cite: Brunetti, G., Simunek, J., Wöhling, T., and Stumpp, C.: Pitfalls and Opportunities in the Use of Markov-Chain Monte Carlo Ensemble Samplers for Vadose Zone Model Calibration, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11129, https://doi.org/10.5194/egusphere-egu23-11129, 2023.

EGU23-13104 | Posters on site | HS3.5

Combining water and pesticide data with coupled surface/subsurface hydrological modeling to reduce its uncertainty. 

Claire Lauvernet, Claudio Paniconi, Emilie Rouzies, Laura Gatel, and Antoine Caisson

In small agricultural catchments over Europe, intensive use of pesticides leads to widespread contamination of rivers and groundwater, largely due to hydraulic transfers of these reactive solutes from plots to rivers. These transfers must be better understood and described in the watershed in order to be able to propose best management practices adapted to the catchment and to reduce its contamination. The physically based model CATHY simulates interactions between surface and subsurface hydrology and reactive solute transport. However, the high sensitivity of pesticide transfers to spatially heterogeneous soil properties induces uncertainty that should be quantified and reduced. In situ data on pesticides in a catchment are usually rare and not continuous in time and space. Likewise, satellite imagery can provide spatial observations of hydrologic variables but not generally of pesticide fluxes and concentrations, and at limited scale and time frequency. The objective of this work is to combine these 3 types of information (model, in situ data, images) and their associated errors with data assimilation methods, in order to reduce pesticide and hydrological variable uncertainties. The sensitivity to spatial density and temporal frequency of the data will be evaluated, as well as the coupled data assimilation efficiency, i.e., the effect of assimilating hydrological data on pesticide-related variables. The methods will be developed using a Python package, and compared/evaluated on twin experiments using virtual data that are however generated over a real vineyard catchment, in Beaujolais, France, in order to ensure realism of the experiments, data, and associated errors.

How to cite: Lauvernet, C., Paniconi, C., Rouzies, E., Gatel, L., and Caisson, A.: Combining water and pesticide data with coupled surface/subsurface hydrological modeling to reduce its uncertainty., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13104, https://doi.org/10.5194/egusphere-egu23-13104, 2023.

EGU23-13589 | Orals | HS3.5 | Highlight

A comparison of sensitivity analysis methods and their value for comparing denitrification models 

Jesús Carrera and Jordi Petchamé

Numerous methods exist to gain insight on a model performance. Sensitivity analysis (SA) tools provide information on how a model output depends on model parameters. It is widely argued that SA is an essential tool for assessing model uncertainty. Here, I review global SA using Variogram Analysis of Response Surfaces (VARS), variance-based methods (Sobol' indices) and polynomial chaos expansion. For the comparison, we use a set of denitrification models, which are needed to assess the fate of nitrate, a global challenge. For each of the models, we assess the uncertainty and reliability of predictions, and the use of SA tools in designing experiments to reduce model uncertainty.

How to cite: Carrera, J. and Petchamé, J.: A comparison of sensitivity analysis methods and their value for comparing denitrification models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13589, https://doi.org/10.5194/egusphere-egu23-13589, 2023.

EGU23-13981 | Posters on site | HS3.5

Sensitivity analysis of water balance components under climate change in Saxony 

Niels Schuetze, Corina Hauffe, Sofie Pahner, Clara Brandes, Kan Lei, and Mellentin Udo

Catchments in Saxony differ regarding their physiographic characteristics (topography, geomorphology, geology, land use, soils, etc.) and their climatic boundaries. Both factors influence the flow behavior and the water balance components of catchments. How sensitive the water balance of catchments responds to current and future changes in the climatic boundary conditions is difficult to predict for each catchment and is associated with significant uncertainties. In Saxony, the pronounced drought in groundwater and surface water from 2018 to 2020 led to considerable regional problems in water supply and quality.

Schwarze et al. (2017) already investigated trends of the observed discharge and variables derived by hydrograph separation (e.g. baseflow) in a sensitivity study. In this presentation, we show the results of an extension of this analysis with current observation data until 2020. The following research questions are investigated: (i) Are catchments in Saxony already responding to changing climatic conditions? (ii) Which regions show the most significant changes in discharge behavior relative to other water balance components? (iii) What are the factors and drivers of changes in the water balance in Saxonian Catchments?

The study is based only on observational data for precipitation, temperature, and discharge in the period of 1961 to 2020 in Saxony. Break point analysis, hydrograph separation, and sensitivity analysis of hydrological signatures are performed for different sets of climate periods to quantify changes and elasticity of the water balance components. As a result, a decreasing trend for the mean flow can be seen for almost all 88 investigated and undisturbed catchments in Saxony. This trend is more pronounced in the mountainous regions than in the lowland of Saxony. Despite the slight increase in the mean annual precipitation, the temperature rise of about one °C from 1991-2020 compared to 1961-1990 in all catchments leads to an increasing evapotranspiration, reduced discharge, and groundwater recharge.

 

References:

Schwarze, R., Wagner, M. and Röhm, P. (2017). Adaptation strategies to climate change - Analysis of the sensitivity of water balance variables of Saxon gauge catchments with respect to the increased temperature level from 1988 onwards compared to the reference state of 1961-1987. Ed.: Saxon State Office for Environment, Agriculture and Geology (LfULG), 2017.

How to cite: Schuetze, N., Hauffe, C., Pahner, S., Brandes, C., Lei, K., and Udo, M.: Sensitivity analysis of water balance components under climate change in Saxony, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13981, https://doi.org/10.5194/egusphere-egu23-13981, 2023.

EGU23-15689 | ECS | Posters on site | HS3.5

Adaptive Surrogate Likelihood Function for Blended Hydrologic Models 

Rezgar Arabzadeh, Jonathan Romero-Cuellar, Robert Chlumsky, James Craig, and Bryan Tolson

This abstract introduces a recipe for an adaptive general likelihood function and its application in the Bayesian epistemology of model parameters and structure uncertainty. The proposed methodology focuses on a special class of likelihood function, hereinafter mentioned as adaptive general likelihood function (AGL), which require a minimum priori assumptions/knowledge about the model residuals. The goal of the AGL is to characterize the model residuals independently from the inference framework in order to avoid incorrectly posterior estimation as a result of jointly inferencing of model and error model parameters. Mathematically, AGL is structured with a mixture of gaussian distributions joined with a first order autoregressive model, account for error model shape and autocorrelation respectively. To assess the AGL application, it is benchmarked with a formal likelihood function formulated by Schoups and Vrugt (2010) and evaluated for 24 Camels basins where the blended model has been deterministically applied with success (Chlumsky et al. 2022). Both approaches are compared with the residual’s empirical distributions using various statistical tests. The model used here is a blended hydrologic model introduced by Mai et al., (2021) which is a class of hydrologic models constructed by averaging (blending) various process options at the process flux level. This blending means calibration of the model functions to identify traditionally calibrated model process parameters as well as the weights utilized to average multiple process options. The model is deployed in the Raven hydrologic framework (Craig et al., 2020) and simultaneously both processes weights and parameters were calibrated deterministically for both high flows and low flows using PADDS algorithm (Asadzadeh and Tolson, 2013). This multi-objective calibration yields a suite of sample of calibrated blended models which is then utilized for error model development and testing. The tests results indicated a statistically comparable performance for both methods for t-distributed residuals highly skewed and long-tailed residual errors which are apparent in many hydrologic model residuals. Finally, to disjoin the epistemic Bayesian inference framework from the error model parameters, an epsilon-support vector regression (eps-SVR) is deterministically trained as a surrogate model to map the structural/parametric variability to residual error model parameters. The eps-SVR calibration performance metrics indicated high quality of surrogate for training set indicating promising performance.

How to cite: Arabzadeh, R., Romero-Cuellar, J., Chlumsky, R., Craig, J., and Tolson, B.: Adaptive Surrogate Likelihood Function for Blended Hydrologic Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15689, https://doi.org/10.5194/egusphere-egu23-15689, 2023.

The applications of statistical learning theory (SLT) in hydrology have been either in the form of Support Vector Machines and other complexity regularized machine learning algorithms that learn and predict input-output patterns such as rainfall-runoff time series or of identifying optimal complexity of low order models such as k nearest neighbour models to predict hydrological time series such as streamflow. The regularization of model complexity offers a way to identify minimal complexity of a model to accurately predict a time series of interest. However such applications often assume that the modelled residual are independent of each other. This limits its application to conceptual hydrological models where residuals are often auto-correlated. This paper applies recent results of risk bounds for time series forecasting and SLT approaches to dynamical system identification to conceptual hydrological models, offering a means to identify optimal complexity of conceptual models and complexity regularised streamflow predictions based on it.

Basins from CAMELS data set are used to demonstrate the effect of regularizing the problem of hydrological model calibration on streamflow prediction over unseen data. SAC-SMA and SIXPAR (a lower order version of SACSMA) are used as model examples. Preliminary results show that prediction uncertainty bounds are narrower if regularization does not improve the performance of a calibrated model over unseen data. This effect is stronger in drier basins than in humid ones. Also, as expected, this effect is stronger when training data size is small and holds for both SACSMA and SIXPAR. 

How to cite: Pande, S. and Moayeri, M.: Complexity-based robust hydrologic prediction: extension of statistical learning theory to conceptual hydrological models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16039, https://doi.org/10.5194/egusphere-egu23-16039, 2023.

Due to the lack of accurate representation of hydrological processes and parameter measurements, physically-based hydrological models consist of many parameters requiring calibration to historical observations so that reliable hydrological inference can be obtained. With the increasing data availability from various sources (e.g., satellite remote sensing, climate model reanalysis), additional information on different water balance components (e.g., soil moisture, groundwater storage, etc.) are used to constrain and validate hydrological models, resulting in better model performance and parameter identifiability. However, given the emergence of multiple datasets for various water budget components, and their differences in temporal and spatial resolutions, the uncertainties in these datasets, when used together in driving and evaluating hydrological models, could introduce potential inconsistencies in water balance estimation and lead to a non-closure problem, which could result in potentially biased parameter and water balance component estimates in hydrological modelling.

This study addresses this issue by examining the impact of inconsistent water balance component data on model performance and exploring the importance of hydrologically consistent data for robust hydrological inference. The assessment is done using a Canadian Hydrologic-Land Surface Models named MESH in the Saskatchewan River basin, Canada over the period of 2002 to 2016. Seven precipitation datasets, seven evapotranspiration products, one source of water storage data – GRACE from three different centers using spherical harmonic and mass concentration approaches – and observed discharge data from hydrometric stations are selected as the input and evaluation data. A reference water balance dataset is developed to optimally combine all available data sources for each water balance component and to obtain water balance closure though a constrained Kalman filter data assimilation technique. The MESH model is rerun with this reference dataset and results are assessed and compared to different combinations of input and evaluation data. Preliminary results reveal great variations of model performance in the water balance components when using different combinations of input and evaluation data and results of using the reference dataset is expected to have less biased water balance component estimates. This study aims to highlight the necessity of using a set of hydrologically consistent data before any model runs and model evaluation.

How to cite: Wong, J. S., Yassin, F., and Famiglietti, J. S.: Does hydrologically consistent data improve model performance? The importance of closing the water balance of input and evaluation data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16924, https://doi.org/10.5194/egusphere-egu23-16924, 2023.

EGU23-1668 | Posters virtual | ESSI1.7

Assessing the retrieval accuracy of SMOS soil moisture product in a Greek agricultural setting 

Triantafyllia Petsini and George P. Petropoulos

Soil moisture is an important parameter of the Earth system and plays a key role in understanding soil-atmosphere interactions through energy balance and the hydrological cycle. Information on its spatiotemporal variability is of crucial importance in several research topics and applications. Remote sensing, today, provides a very promising avenue towards obtaining information on the variability of soil moisture at varying spatial and temporal resolutions. Also, currently, a number of relevant operational products is available from different satellite sensors.

The objective of the present study has been to evaluate one such product specifically that from the SMOS satellite in a typical Mediterranean setting located in Greece. In particular, this study examines the agreement of the SMOS soil moisture product with collocated field measurements from the Prefecture of Larisa for calendar year of 2020 acquired from Neuropublic S.A.. The agreement between the two datasets was evaluated on the basis of several statistical measures. Also, the effect of topographical and geomorphological features, land use/cover and the relative satellite orbit type and the Radio Frequency Interference (RFI) was examined as part of our analysis.

To our knowledge, this study is one of the few providing an insight of the SMOS soil moisture product accuracy in a Greek setting. Findings of our study can provide important insights towards understanding the practical value of such products in agricultural and arid/semi-arid Mediterranean environments such that of Greece and also help efforts directed towards improving their retrieval accuracy.

Keywords: soil moisture; operational product; remote sensing; SMOS; validation; agriculture; Mediterranean setting

How to cite: Petsini, T. and Petropoulos, G. P.: Assessing the retrieval accuracy of SMOS soil moisture product in a Greek agricultural setting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1668, https://doi.org/10.5194/egusphere-egu23-1668, 2023.

EGU23-1761 | ECS | Posters virtual | ESSI1.7

Exploring the synergy of EnMAP hyperspectral imagery with Machine Learning for land use- land cover mapping in a Mediterranean setting 

Christina Lekka, Spyridon E. Detsikas, and George P. Petropoulos

The Environmental Mapping and Analysis Program (EnMAP), is a new spaceborne German hyperspectral satellite mission for monitoring and characterizing the Earth’s environment on a global scale. EnMAP mission supports the retrieval of high-quality and abundant detailed spectral information in VNIR and SWIR ranges within a large-scale area in wide temporal coverage and high spatial resolution. Taking advantage of high-quality data freely available to the scientific community great potential is revealed in a wide range of ecological and environmental applications, such as i.e. accurate and up-to-date LULC thematic maps.

The objectives of the present study are to explore the accuracy of EnMAP in land cover mapping over a heterogeneous landscape. As a case study is used a typical Mediterranean setting located in Greece. The methodology is based on the synergistic use of machine learning techniques and ENMAP imagery coupled with other ancillary data and was carried out in EnMAP Box-3, a toolbox designed within a GIS open-source software. Validation of the derived LULC maps has been carried out using the standard error matrix approach and also via comparisons versus existing LULC operational products.

To our knowledge, this research is one of the first to explore the advantages of the hyperspectral EnMAP satellite mission in the context of LULC mapping. Results of the present study are expected to provide valuable input for applications of LULC mapping and demonstrate the potential of hyperspectral EnMAP data for improved performance and the highest accuracy of LULC mapping.

 

KEYWORDS: EnMAP, Land cover, Land use, Hyperspectral remote sensing, Machine Learning

How to cite: Lekka, C., Detsikas, S. E., and Petropoulos, G. P.: Exploring the synergy of EnMAP hyperspectral imagery with Machine Learning for land use- land cover mapping in a Mediterranean setting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1761, https://doi.org/10.5194/egusphere-egu23-1761, 2023.

Information on Impervious Surface Areas (ISA) is required in various studies related to the urban environment. The continuous expansion of these surfaces is being noticed in large urban centers as a result of urbanization. The development of automated methodologies for mapping the ISas using remote sensing data has experienced a great growth in recent years.

The aim of the present study is the long-term mapping of ISA changes in Athens, Greece, from 1984 to 2022, exploiting the Landsat archive and contemporary methods of geospatial data processing, such as Machine Learning. The study implementation is also carried out in Google Earth Engine cloud platform and the final results obtained are presented in a WebGIS environment.

The results of the present study can contribute to a better understanding of the urban expansions dynamics and the key drivers linked to the urban sprawl that affect cities such as Athens. Furthermore, they can serve as a reference for further development of applications related to urban environments, using machine learning techniques combined with remote sensing data.

 

KEYWORDS: ISA, urban sprawl, Landsat, GEE, WebGIS, Greece

How to cite: Dermosinoglou, K. and Petropoulos, G. P.: Long term monitoring of the changes in Impervious Surface Areas in a Greek setting using Machine Learning and Remote Sensing data: the case of Athens Greece, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1785, https://doi.org/10.5194/egusphere-egu23-1785, 2023.

EGU23-3434 | ECS | Orals | ESSI1.7

The benefit of textural features for SAR-based tropical forest disturbance mapping 

Johannes Balling, Martin Herold, and Johannes Reiche

Cloud penetrating Synthetic Aperture Radar (SAR) imagery has proven effective for tropical forest monitoring at national and pan-tropical scales. Current SAR-based disturbance detection methods rely on identifying decreased post-disturbance backscatter values as an indicator of forest disturbances. However, these methods suffer from a major shortcoming, as they show omission errors and delayed detections for some disturbance types (e.g., logging or fires). Here, post-disturbance debris or tree remnants result in stable SAR backscatter values similar to those of stable forest. Despite fairly stable backscatter values we hypothesize that different orientation and arrangement of tree remnants lead to an increased heterogeneity of adjacent disturbed pixels. Increased heterogeneity can be quantified by textural features. We assessed six Gray-Level Co-Occurrence Matrix (GLCM) textural features utilizing Sentinel-1 C-band SAR time series. We used a pixel-based probabilistic change detection algorithm to detect forest disturbances based on each GLCM feature and compared them against forest disturbances detected using only backscatter data. We further developed a method to combine both backscatter and GLCM features to detect forest disturbances. GLCM Sum Average (SAVG) performed best out of the tested GLCM features. Omission errors were reduced of up to 36% and the timeliness of detections was improved of up to 30 days by applying the combination method of backscatter and GLCM SAVG. Test sites characterized by large unfragmented disturbance patches (e.g., large-scale clearings, fires and mining) showed the greatest spatial and temporal improvement. A GLCM kernel size of 5 leads to the best trade-off of improving timeliness of detections and reducing omission errors while not introducing commission errors. The robustness of the developed method was verified for a variety of natural and human-induced forest disturbance types in the Amazon Biome. Our results show that combined SAR-based textural features and backscatter can overcome omission errors caused by post-disturbance tree remnants. Combining textural features and backscatter can support law enforcement activities by improving spatial and temporal accuracy of operational SAR-based disturbance monitoring and alerting systems.

How to cite: Balling, J., Herold, M., and Reiche, J.: The benefit of textural features for SAR-based tropical forest disturbance mapping, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3434, https://doi.org/10.5194/egusphere-egu23-3434, 2023.

EGU23-5175 | ECS | Orals | ESSI1.7

A comparative study of SMAP and ASCAT satellite soil moisture products with cosmic-ray neutron sensing and in-situ data in a Mediterranean setting 

Spyridon E. Detsikas, George P. Petropoulos, Nikos Koutsias, Dionisios Gasparatos, Vasilis Pisinaras, Heye Bogena, Frank Wendland, Frank Herrmann, and Andreas Panagopoulos

Obtaining Soil Moisture Content (SMC) over large scales is of key importance in several environmental and agricultural applications especially in the context of climate change and transition to digital farming. Remote sensing (RS) has a demonstrated capability in retrieving SMC over large areas with several operational products already available at different spatiotemporal resolutions. At the same time, cosmic-ray neutron sensing is a recently emerged approach in retrieving high temporal resolution SMC at intermediate spatial scales. The present study conducts an intercomparison between different RS-based soil moisture products, daily SMC retrievals from a cosmic-ray neutron sensor (CRNS) station and a network of in situ SoilNet wireless sensors installed at the Pinios Hydrologic Observatory ILTER site in central Greece for a time period of 2018-2019. The RS-based soil moisture products included herein are from NASA’s Soil Moisture Active Passive (SMAP) and Metop-A/B Advanced Scatterometer (ASCAT) satellite missions. The methodological workflow adopted includes standardized validation procedures employing a series of statistical measures to quantify the agreement between the different RS-based soil moisture products, CRNS-based SMC and the SoilNet ground truth data. Our study results contribute towards global efforts aiming at exploiting CRNS data in the context of soil moisture retrievals and their potential synergies with RS-based products. Furthermore, our findings provide valuable insights into assessing the capability of CRNS at retrieving more accurate SMC estimates at arid and semi-arid environments such as those found in the Mediterranean basin, while supporting also ongoing global validation efforts.

Keywords: Cosmic Ray Neutron Sensors; SMAP; ASCAT; SoilNet; Soil Moisture Content

How to cite: Detsikas, S. E., Petropoulos, G. P., Koutsias, N., Gasparatos, D., Pisinaras, V., Bogena, H., Wendland, F., Herrmann, F., and Panagopoulos, A.: A comparative study of SMAP and ASCAT satellite soil moisture products with cosmic-ray neutron sensing and in-situ data in a Mediterranean setting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5175, https://doi.org/10.5194/egusphere-egu23-5175, 2023.

The upper range limit of trees is the most conspicuous boundary on the Earth. However, the publicly available forest extent or forest cover datasets systematically underestimated sparse tree cover, which hindered our recognition of tree limit distribution and its drivers over cold and arid regions. Here, we built a three-step upscaling strategy, that integrates in situ measured vegetation types with spaceborne Light Detection and Ranging (LiDAR), microwave, and Landsat images in a Convolutional Neural Networks (CNN) classification algorithm, to develop a new map of the upper range limit of trees over the Three-River-Source National Park circa 2020 at 30 m resolution. The multi-satellite-based new products consider vertical structure information that could better detect sparse trees and better distinguish between the shrub, grass, and forest. Validation shows our result reveals high consistency with manual interpretations from Google Earth high-resolution images (R2 = 0.97, slope = 0.99, ME = 18 m). Our proposed method provides a fast and effective tree limit mapping solution at the global scale.

How to cite: Xu, J., Wang, X., Lv, G., and Wang, T.: High-resolution map of the upper range limit of trees over the cold and arid region, a case study in the Three-River-Source National Park, Tibetan Plateau, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6800, https://doi.org/10.5194/egusphere-egu23-6800, 2023.

EGU23-8451 | Orals | ESSI1.7

Enhanced and gap-free Sentinel-2 reflectance data at vast scales with GEE 

Emma Izquierdo-Verdiguier, Álvaro Moreno-Martínez, Jordi Muñoz-Mari, Nicolas Clinton, Francesco Vuolo, Clement Atzberger, and Gustau Camps-Valls

The presence of clouds and aerosols in satellite imagery hamper their use to monitor, observe and analyze the Earth's surface. Multisensor fusion can alleviate this problem. The HISTARFM algorithm developed by Moreno-Martinez et al. (2020) can generate monthly gap-filled reflectance data at 30 m spatial resolution by blending Landsat (30 m pixel size every 16 days) and MODIS (500 m pixel size daily) data using a bias-aware Kalman filter. 

Cloud computing platforms such as Google Earth Engine (GEE) help us to efficiently process public data archives from different remote sensing data sources. Therefore, GEE allows us to adapt the HISTARFM algorithm to obtain gap-filled data at higher spatial resolution. To reduce the massive number of images involved in the process, the bias-aware Kalman filter blends the available and preprocessed HISTARFM monthly gap-filled reflectance (30 m pixel size every month) and Sentinel-2 (10 m pixel size at five days) data. The very high spatial gap-filled images provide reflectance information at feasible scales to obtain new products that improve decision-making activities in variable territories with complex topographies. Also, new derivative products (e.g. land cover maps, biophysical parameters, or phenological indicators) will provide the scientific community better understanding and monitoring of bio-geographical and ecoclimatic characteristics of the Earth.

Additionally, a reduction of the time resolution of the temporal series is manageable with this approach by linear interpolation producing five days of gap-filled reflectance Sentinel-2 data. The proposed approach shows promising preliminary results and provides gap-free reflectance Sentinel-2 images with their associated uncertainties. These results foster the development of improved near-real-time applications for crop and natural vegetation monitoring at continental scales.

How to cite: Izquierdo-Verdiguier, E., Moreno-Martínez, Á., Muñoz-Mari, J., Clinton, N., Vuolo, F., Atzberger, C., and Camps-Valls, G.: Enhanced and gap-free Sentinel-2 reflectance data at vast scales with GEE, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8451, https://doi.org/10.5194/egusphere-egu23-8451, 2023.

EGU23-8825 | ECS | Orals | ESSI1.7

PASSION: a workflow for the estimation of rooftop photovoltaic potential from satellite imagery. 

Rodrigo Pueblas, Jann Weinand, Patrick Kuckertz, and Detlef Stolten

Photovoltaic (PV) and wind are currently the highest growing renewable energies according to the annual World Energy Outlook 2022. However, in the case of solar PV in Europe, this growth is mainly driven by utility-scale installations. Distributed residential generation has many benefits, such as relieving the electrical grid and an increase of self-sufficiency. One challenge of this topic is to accurately estimate the rooftop PV potential of different regions, in order to best allocate economic resources and regulate accordingly. Multiple approaches have been proposed in the past, such as infering from proxy variables like population density, automatically analyzing residential 3D point clouds or automatically analyzing satellite images. The latter has gained popularity in the recent years given the increased availability of satellite imagery and the improvement of Computer Vision methods. However, in research, the analysis of satellite imagery is impeded by the lack of transparency, reproducibility, and standardization of methods. Studies are heterogeneous, target different types of potential with redundant efforts, and are mostly not open source or using private datasets for training. This makes it challenging for users of various backgrounds to find and use the existing approaches.

For these reasons, this paper proposes a conceptual framework that describes and categorizes the tasks that need to be considered when estimating PV potential, thus creating a clear framework along which the contents of this research report can be classified. Addidionally the open source workflow PASSION is introduced, which integrates the assessment of geographical, technical and economic potentials of regions under consideration along with the calculation of surface areas, orientations and slopes of individual rooftop sections. It also includes the detection of obstacles and existing PV installations. It is based on a novel two-look approach, in which three independent models are deployed in parallel for the identification of rooftops, sections and superstructures. The three models show a mean Intersection Over Union (IoU) between classes of 0.847, 0.753 and 0.462 respectively, and more importantly show consistent results in non-selected real life samples.

How to cite: Pueblas, R., Weinand, J., Kuckertz, P., and Stolten, D.: PASSION: a workflow for the estimation of rooftop photovoltaic potential from satellite imagery., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8825, https://doi.org/10.5194/egusphere-egu23-8825, 2023.

EGU23-8916 | ECS | Orals | ESSI1.7

Separating tree systems in agricultural lands from forests using Deep learning 

Wanting Yang, Daniel Ortiz Gonzalo, Xiaoye Tong, Dimitri Pierre Johannes Gominski, Martin Brandt, Ankit Kariryaa, Florian Reiner, and Rasmus Fensholt

Distinguishing trees on agricultural land from forests is essential for a better understanding of the relationship between forests and human farming activities. However, it is difficult to separate them with remote sensing imagery since they share similar canopy cover, especially on the edge of the amazon rain forest, which has a much-complicated agriculture pattern. Except for annual crops and pasture, there are also lots of agroforestry applications and shifting cultivation, which integrates many tree systems. And those tree systems are not well separated from the forest in the existing land cover map. Recent techniques allow for the mapping of single trees outside of forests, now we take the next step by identifying those diverse tree-involved systems in agricultural land. Here we aim to generate a robust, cost-efficient method to distinguish trees within agricultural land from the forest. We started our exploration from Peruvian Amazon, where the competition for land has increased in the last decades, causing possible adverse effects on livelihoods and ecosystem services. Deep learning models, data sampling, and fine-tuning strategies are tested and optimized with PlanetScope satellite imagery. Our research target is to provide a tool for separating tree systems in farmland from the forest. It can also be used as a base map to explore the dynamic of agriculture transition and its impact on livelihoods and ecosystem services. 

How to cite: Yang, W., Ortiz Gonzalo, D., Tong, X., Pierre Johannes Gominski, D., Brandt, M., Kariryaa, A., Reiner, F., and Fensholt, R.: Separating tree systems in agricultural lands from forests using Deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8916, https://doi.org/10.5194/egusphere-egu23-8916, 2023.

EGU23-13610 | ECS | Orals | ESSI1.7

Unsupervised Segmentation Of Microwave Brightness Temperatures To Study The Changes In The Water Cycle 

Vibolroth Sambath, Nicolas Viltard, Laurent Barthès, Audrey Martini, and Cécile Mallet

Due to climate change, understanding the changes in the water cycle has become a pressing issue. It is increasingly important to study prolonged periods of intense precipitation or dry spells to better manage water supply, infrastructure and agriculture. However, obtaining fine-scale precipitation data is challenging due to the intermittent nature of rain in time and space. Ground-based instruments could have mismatches between different regions due to spatial distribution, calibration, and complex topography. On the other hand, space-borne observations have uncertainties in their retrieval algorithms. This study proposes to deal directly with microwave images from space remote sensing, as this type of data makes it possible to study the evolution of the atmospheric water cycle on a global scale and with a temporal coverage of several decades by avoiding the uncertainties from retrieval methods. In recent years, convolutional neural networks have shown promising capabilities in identifying cyclones and weather fronts in large labelled climate datasets. However, these models required large labelled datasets for training and testing. The present study aims to test unsupervised segmentation approaches of microwave images, which are thus segmented into different classes. Instead of focusing only on one aspect, for example, precipitation, the obtained classes contain many physical properties. This is due to the fact that microwave brightness temperatures contain essential information relative to the atmospheric water cycle that can be used to derive many products such as rain intensity, water vapour, cloud fraction, and sea surface temperature. The unsupervised segmentation model consists of blocks of fully convolutional networks serving as feature extractors. Without labels, pseudo-targets from the feature extractors are used to train the model. The performance of the model in terms of intra-class and inter-class distances is compared with those of simpler models such as Kmeans. A major challenge in the unsupervised approach is validating and interpreting the resulting classes. Most of the obtained cluster patterns provide geographically coherent regions whose mode of variability of geophysical quantities can be highlighted. The presented study will then explore how the different classes computed by the unsupervised methods can be labelled and how the properties of the said classes change through time and space.

How to cite: Sambath, V., Viltard, N., Barthès, L., Martini, A., and Mallet, C.: Unsupervised Segmentation Of Microwave Brightness Temperatures To Study The Changes In The Water Cycle, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13610, https://doi.org/10.5194/egusphere-egu23-13610, 2023.

EGU23-14137 | ECS | Posters virtual | ESSI1.7

The Global Distribution and Trajectory of Aquaculture Ponds 

Yang Xu and Lian Feng

The developments of global aquaculture ponds provide valuable socio-economic benefits in the Anthropocene epoch, also cause potential environmental and ecological impacts. However, the extent and trajectory of aquaculture ponds over the past 37 years remain unknown on a global scale. Our study maps the global distribution of aquaculture ponds over 9 periods (1984-1994, 1995-2000, and every 3 years from 2001 to 2021) based on a deep-learning method and Landsat observations. The total area of global aquaculture ponds expands from 10043.3 km2 to 18779.70 km2 and showed a slowing growth rate. Asia fishpond area accounts for up to 82% of the world's area. The extent of aquaculture ponds in Asia and South America have doubled in size since 1984. China, Vietnam, and Indonesia- the three countries with the largest fishpond area- exhibited the largest fishponds area at 2004-2006. Our study provides a critical basis for assessing spatial-temporal trajectory and potential influences of aquaculture ponds.

How to cite: Xu, Y. and Feng, L.: The Global Distribution and Trajectory of Aquaculture Ponds, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14137, https://doi.org/10.5194/egusphere-egu23-14137, 2023.

EGU23-14375 | ECS | Orals | ESSI1.7

Self-Supervised Contrastive Model for Flood Mapping and Monitoring on SAR Time-Series 

Ritu Yadav, Andrea Nascetti, Hossein Azizpour, and Yifang Ban

Flooding is a natural disaster that has been increasing in recent years due to climate and land-use changes. Earth observations, such as Synthetic Aperture Radar (SAR) data, are valuable for assessing and mitigating the negative impacts of flooding. Cloud cover is highly correlated with flooding events, making SAR a preferable choice over optical data for flood mapping and monitoring.

Traditional methods for flood mapping and monitoring using SAR data, such as otsu and CVA, can be affected by noise, false detections due to shadows and occlusions, and geometric distortions. While automatic thresholding can be effective with these methods, manual adjustment of the threshold is often required to produce an accurate change map. 

Supervised deep learning methods using large amounts of labeled data could potentially improve the accuracy of flood mapping and monitoring. We have a large amount of earth observation data, but the availability of labeled data is limited and labeling data is time-consuming and requires domain expertise. On the other hand, Supervised model training on small datasets causes severe generalizability issues when inference is taken on a new site. 

To address these challenges, we propose a novel self-supervised method for mapping and monitoring floods on Sentinel-1 SAR time-series data. We propose a probabilistic model trained on unlabeled data using self-supervised techniques, such as reconstruction and contrastive learning. The model is trained to learn the spatiotemporal features of the area. It monitors the changes by comparing the latent feature distribution at each time stamp and generates change maps to reflect the changes in the area. 

We also propose a framework for flood monitoring that continuously monitor the area using time series data. This framework automatically detects the change point i.e. when the major change started reflecting on available SAR data. Our continuous monitoring framework combined with a better temporal resolution (better than Sentinel-1) can potentially detect flood events in an early stage, allowing for more time for evacuation planning. 

The model is evaluated on nine recent flood events from ‘Mekong’, ‘Somalia’, ‘Scotland’, ‘Australia’, ‘Bosnia’, ‘Germany’, ‘Spain’, ‘Bolivia’, and ‘Slovakia’ sites. We compared our results with traditional methods, and existing supervised and unsupervised methods. Our detailed evaluation indicates that our model is more accurate and generalizable to new sites. The model achieves an average Intersection Over Union (IoU) value of 70% and an F1 score of 81.14%, which are both higher than the scores of the previous best-performing method. Overall, our proposed model’s improvement range from 7-26% in terms of F1 and 8-31% in terms of IoU score. 

How to cite: Yadav, R., Nascetti, A., Azizpour, H., and Ban, Y.: Self-Supervised Contrastive Model for Flood Mapping and Monitoring on SAR Time-Series, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14375, https://doi.org/10.5194/egusphere-egu23-14375, 2023.

EGU23-15020 | ECS | Orals | ESSI1.7

Transfer Learning for LULC Classification on multi-modal data in the Amazon Basin 

Maximilian Hell, Melanie Brandmeier, and Andreas Nüchter

Mapping of Land use and land cover (LULC) changes over time requires automated processes and has been investigated using various machine learning algorithms and, more recently, deep learning models for semantic classification. New applications of these models to different satellite data and areas are regularly published. However, studies on the transfer of these models to other data and study areas are rather scarce. In a previous study [1], we used multi-modal and –temporal Sentinel data for LULC classification using traditional and novel deep learning models. The data covered parts of the Amazon basin and was comprised of a twelve-month time series of radar imagery (Sentinel-1), combined with a singular multi-spectral image (Sentinel-2). All satellite images were captured throughout the year 2018. The label map (Collection 4) of the Amazon produced by the MapBiomas project [2] was used as training and test labels. Besides state-of-the-art models, we developed five variations of a deep learning model—DeepForest—which leverages on the multi-temporal and -modal aspect of the data. The best model variation (DF1c) reached an overall accuracy of 74.4% on the test data.

Currently we are investigating the transferability of these models to more recent data of the same region. The new dataset was processed in the same way as in the previous study. It comprises a Sentinel-1 time-series and a single Sentinel-2 images from 2020, with an updated version of the label map of the MapBiomas project (Collection 6). This posed some challenges, as the classification scheme changed and is not fully backwards compatible with the one used to train the DeepForest models. A test dataset was chosen in the state of Mato Grosso, as the satellite scenes cover most classes used in the classification scheme. However, this data exhibits some class imbalance, as two of the eleven classes are dominating the scene. All five DeepForest variations reached accuracies higher than 79% and thus generalize well on the major LULC classes. For comparison and to further improve our models, we currently retrain the models on the new, larger data set (114,376 training image tiles compared to 18,074). Preliminary results will be shown during the session.

 

References

  • Cherif, E.; Hell, M.; Brandmeier, M. DeepForest: Novel Deep Learning Models for Land Use and Land Cover Classification Using Multi-Temporal and -Modal Sentinel Data of the Amazon Basin. Remote Sensing 2022, 14, 5000, doi:10.3390/rs14195000.
  • MapBiomas Brasil; Available online: https://mapbiomas.org/en

How to cite: Hell, M., Brandmeier, M., and Nüchter, A.: Transfer Learning for LULC Classification on multi-modal data in the Amazon Basin, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15020, https://doi.org/10.5194/egusphere-egu23-15020, 2023.

Riparian ecosystems are biodiversity hotspots and provide crucial services to human wellbeing. Currently, the knowledge of how riparian ecosystems respond to and in turn influence the variations of the environment remains considerably limited. As a first step toward filling the gap, this research aims to characterize the dynamics of riparian vegetation during the past several decades across multiple aquatic sites operated by the National Ecological Observatory Network (NEON) of the US. Specifically, it leverages high-resolution hyperspectral and lidar data collected by NEON’s airborne observational platform (AOP) surveys, the long-term records of satellite optical and radar imagery, and advanced data fusion and classification techniques to generate a time-series record of riparian vegetation on a seasonal-to-yearly basis. The maps derived will provide a new basis for understanding how riparian vegetation has changed across continental US, and for predicting how it is likely to change in the future. This work is sponsored by NSF’s Macrosystems Biology and NEON-Enabled Science (MSB-NES) Program (2021/9–2024/8), and the overarching goal of the project is to mechanistically link riparian vegetation dynamics to hydroclimate variations and assess the functional importance of riparian ecosystems to macrosystem fluxes of carbon and water.

How to cite: Jin, H. and Tai, X.: Spatiotemporal mapping of riparian vegetation through multi-sensor data fusion and deep learning techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17425, https://doi.org/10.5194/egusphere-egu23-17425, 2023.

EGU23-888 | ECS | PICO | ESSI1.8

Generating self-labeled geological datasets for semantic segmentation using pretrained GANs 

Ivan Ferreira and Ardiansyah Koeshidayatullah

Recent advancement in deep generative models (GANs) has brought the attention of many researchers to explore the feasibility of using realistic synthetic data as (i) a digital twin of the original dataset and (ii) a new approach to augment the original dataset. Previous works highlighted that GANs can replicate both esthetical and statistical characteristics of datasets, up to the point of being indistinguishable from real samples, even when being examined by domain experts. In addition, the weights learned during the unsupervised training of these generative models are useful to further extract specific features of interest from the given dataset. In geosciences, many computer vision tasks are related to semantic segmentation, from pore quantification to fossil characterization. In such a task, the labeling process becomes the main limiting point, being both time-consuming and requiring domain experts. Hence, in this study, we repurpose GANs to obtain self-labeled geological datasets for semantic segmentation to be readily applicable in geological machine learning workflows. In this work, we used trained style-based GANs of foraminifera specimens, ooids, and mudstones. Our experiments show that with one or a few labels, we can successfully generate self-labeled and synthetic datasets featuring the labels of interest. This achievement is pivotal in geosciences in exploring the idea of GANs for one-shot and few-shot segmentation and in minimizing the efforts of manual labeling for segmentation requiring domain experts. 

How to cite: Ferreira, I. and Koeshidayatullah, A.: Generating self-labeled geological datasets for semantic segmentation using pretrained GANs, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-888, https://doi.org/10.5194/egusphere-egu23-888, 2023.

EGU23-2203 | PICO | ESSI1.8

Object detection and classification applying AI (computer vision) to underwater images 

Young-Tae Son, Sang-Yeop Jin, and Tae-Soon Kang

Visual AI (artificial Intelligence) YOLOv5 algorithm was used in order to detect marine organisms from underwater images, and the test results showed an average high detection rate (>90%). As performance indicators of the AI model, both precision and recall showed very good performance, exceeding 0.95. So as to minimize the change in object detection performance according to the variation of underwater conditions, image correction was conducted, and more objects could be detected after image correction.

In order to determine which species the object detected in the video or image corresponds to, the performance was evaluated by AI learning classification model (YOLO-Classification), which is a deep learning algorithm (approximately 3% accuracy improved after image correction). We tried to identify the taxonomic species of organisms using deep learning, and although the number of target species was small, we achieved a classification accuracy of about 80% or more based on the data collected so far.

High-quality image DB data of the target species have to be established from a long-term perspective in order to accurately classify object (fish) species, and imaged taken from various angles of the target species must be collected simultaneously improve performance. 

As a prerequisite for measuring the size of an object detected in an image, MDE (Monocular Depth Estimation), a deep learning algorithm for estimating the depth of a mono camera image, was applied and the distance from a certain reference point was calculated with the MiDAS v3 algorithm. As a result of the MiDas v3 algorithm test, the excessive error has been reduced compared to before application and the distance measurement accuracy of up to 2m, which is longer than the guide stick length, has been obtained.

How to cite: Son, Y.-T., Jin, S.-Y., and Kang, T.-S.: Object detection and classification applying AI (computer vision) to underwater images, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2203, https://doi.org/10.5194/egusphere-egu23-2203, 2023.

EGU23-3254 | ECS | PICO | ESSI1.8

Construction of Interactive Websites for Remote Sensing Datasets 

Kai Norman Clasen and Begüm Demir

As a result of advancements in satellite technology, archives of remote sensing (RS) images are continuously growing, providing a valuable source of information for monitoring the Earth's surface. Researchers construct well-designed and ready-to-use datasets from the plethora of RS images for the broader community to make it easier to develop and compare novel algorithms, models, and architectures to further deepen our understanding of our planet from space. However, the descriptions of these datasets are often published in scientific papers as PDF files with several limitations:

  • The target audience is typically domain experts familiar with scientific jargon;
  • The work is required to adhere to a specific page limit;
  • Once the document is published, it is difficult to update sections or to centralize discussions around it. 

To overcome these issues, here we introduce the concept of interactive dataset websites that aim at making the dataset and research based on it more accessible. With visual and interactive examples, users can see exactly how the data is structured and how the data can be used in different contexts. For example, when working with RS data, it is beneficial to get a quick overview of the geographical distribution. By providing more in-depth background information about data sources and product specifications, these websites can also help users understand the context in which the data was collected, how it might be relevant to their work, and how to avoid common pitfalls. Another important aspect of interactive dataset websites is the inclusion of example code for using, loading, and visualizing the data. Especially when working with RS images (e.g., multispectral, hyperspectral, synthetic aperture radar data, etc), it is often not trivial to visualize the data. Providing example code can be especially useful for researchers unfamiliar with the specific tools required to work with the data or to introduce to the community tools specifically written to make it easier to work with the dataset. Quick feedback can be vital, as it allows researchers to report problems or ask questions that the authors or community can address in an open and centralized manner. Creating these "living, ever-evolving documents" makes them an increasingly valuable resource for anyone working with the dataset, leading to more robust and reliable research.

It might seem daunting at first to create such an interactive dataset website, but due to recent open-source projects such as Executable Books (https://executablebooks.org/) and free hosting providers such as GitHub Pages (https://pages.github.com/), it has become relatively easy to produce and host such websites. The HTML content can be generated from Jupyter Notebooks, a tool that many researchers and data scientists are familiar with. To provide an example, in our talk we will showcase an interactive dataset website for the BigEarthNet-MM dataset, which you can find here: https://docs.kai-tub.tech/ben-docs/

How to cite: Clasen, K. N. and Demir, B.: Construction of Interactive Websites for Remote Sensing Datasets, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3254, https://doi.org/10.5194/egusphere-egu23-3254, 2023.

EGU23-3493 | PICO | ESSI1.8

OGC Testbed-18 Machine Learning Training Datasets Task: Application of standards to Machine Learning training datasets 

Samantha Lavender, Caitlin Adams, Ivana Ivánová, and Kate Williams

Training datasets are a crucial component of any machine learning approach, with significant human effort spent creating and curating these for specific applications. However, a historical absence of standards has resulted in inconsistent and heterogeneous training datasets with limited discoverability and interoperability. Therefore, there is a need for best practices and guidelines for generating, structuring, describing, and curating training datasets.

The Open Geospatial Consortium (OGC) Testbed-18 initiative covered several topics related to geospatial data, focussing on issues around cataloguing and interoperability. Within Testbed-18, the Machine Learning Training Datasets task aimed to develop a foundation for future standardization of training datasets for Earth observation applications.

For this task, members from Pixalytics, FrontierSI, and Curtin University authored an Engineering Report that reviewed:
·       Examples of how training datasets have been used in Earth observation applications
·       The current best-practice methods for documenting training datasets
·       The various requirements for training dataset metadata
·       How the Findability, Accessibility, Interoperability, and Reuse (FAIR) principles apply to training datasets

The Engineering Report provides a foundation that OGC can leverage in creating the future standard for machine learning training data for Earth observation applications. The Engineering Report also provides a useful overview of the state of work and key considerations for anyone wishing to improve how they document their training datasets.

In our presentation, we discuss the key findings from the Engineering Report, including key metadata identified from Earth observation use cases, the current state of the art, thoughts on cataloguing and describing training data quality, and how the FAIR principles apply to training data. 

How to cite: Lavender, S., Adams, C., Ivánová, I., and Williams, K.: OGC Testbed-18 Machine Learning Training Datasets Task: Application of standards to Machine Learning training datasets, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3493, https://doi.org/10.5194/egusphere-egu23-3493, 2023.

EGU23-5394 | ECS | PICO | ESSI1.8

AwesomeGeodataTable - Towards a community-maintained searchable table for data sets easily usable as predictors for spatial machine learning 

Maximilian Nölscher, Anne-Karin Cooke, Sandra Willkommen, Mariana Gomez, and Stefan Broda

In the field of spatial machine learning, access to high-quality data sets is a crucial factor in the success of any analysis or modeling project, especially in subsurface hydrology. However, finding and utilizing such data sets can be a challenging and time-consuming process. This is where AwesomeGeodataTable comes in. AwesomeGeodataTable aims to establish a community-maintained searchable table of data sets that are easily usable as predictors for spatial machine learning starting with the focus on subsurface hydrology. With its user-friendly interface and currently small but growing number of data sets, AwesomeGeodataTable will make it easier for researchers and practitioners to find and use the data they need for their work. It brings the usability of existing data set collections to a next level through adding features for filtering and searching meta information on data sets. This talk will introduce attendees to the AwesomeGeodataTable project, its goals and features, and how they can get involved in maintaining and extending its database and expanding its features and user experience. Overall, AwesomeGeodataTable is a valuable resource for anyone working in the field of spatial machine learning, and we hope to see it become a widely used and respected resource in the community.

How to cite: Nölscher, M., Cooke, A.-K., Willkommen, S., Gomez, M., and Broda, S.: AwesomeGeodataTable - Towards a community-maintained searchable table for data sets easily usable as predictors for spatial machine learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5394, https://doi.org/10.5194/egusphere-egu23-5394, 2023.

EGU23-7299 | PICO | ESSI1.8

Towards generation of synthetic hyperspectral image datasets with GAN 

François De Vieilleville, Adrien Lagrange, Nicolas Dublé, and Bertrand Le Saux

In the context of the project CORTEX, a study was carried out to build a method to generate synthetic images with associated labels for hyperspectral use cases. Such a method is interesting in the case where too few annotated data are available to train a deep neural network (DNN). The context of hyperspectral images is particularly suited for this problem since labeled datasets of hyperspectral images are scarce and generally of very small size.

Therefore, the first step of the project was to define an interesting hyperspectral use case to carry out the study. More concretely, generative models must be trained to achieve this objective. It means that a set of hyperspectral images and their associated ground truth are necessary to train the models. A dataset was created with PRISMA images associated with the IGN BD forest v2. The result is a segmentation dataset of 1268 images of size 256x256 pixels with 234 spectral bands. The associated ground truth includes 4 classes: not-forest, broad-leaved forest, coniferous forest and mixed forest. To correctly match the ground truth and the images, an important work was done about the improvement of the geolocalization of the PRISMA images by coregistering patches with Sentinel-2 images. We want to underline the interest of this database that remains from our knowledge one the few large scale HS database and is made available on the platform Zenodo.

Then, a segmentation model was trained with the dataset to assess its quality and the feasibility of the task of forest-type segmentation. Good results were obtained using a Unet-EfficientNet segmentation DNN. It showed that the dataset is coherent but the problem still difficult since the ‘mixed forest’ class remains challenging to identify.

Finally, an important research work was conducted to develop a Generative Adversarial Network method able to generate synthetic hyperspectral images. The state-of-the-art StyleGAN2 was modified to this purpose. An additional discriminator was added and tasked with the job of discriminating synthetic and real images in a reduced image space. Good results were obtained for the generation of 32-bands images, but the results worsen when increasing more the number of bands. The difficulty of the problem appears directly linked to the number of bands that we look to generate.

The final goal was to generate synthetic ground truth masks alongside the images and the method SemanticGAN was elected to address this problem. Since this method is based on StyleGAN2, the improvements of StyleGAN2 for HS images were included in the method. At the end, a modified version of SemanticGAN was proposed. The discriminator assessing the coherence between masks and images was modified to use an image of reduced dimension and a specific training strategy was introduced to help the convergence. The initial expectation was that the generation of masks would help stabilizing the generation of images, but the experiments showed the contrary. Early results are promising, but more research will be necessary to obtain couples of images and masks that could be used to train a DNN.

How to cite: De Vieilleville, F., Lagrange, A., Dublé, N., and Le Saux, B.: Towards generation of synthetic hyperspectral image datasets with GAN, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7299, https://doi.org/10.5194/egusphere-egu23-7299, 2023.

EGU23-12352 | ECS | PICO | ESSI1.8

Point-Cloud Class Separability: Identifying the Most Discriminative Features 

Max Hess, Aljoscha Rheinwalt, and Bodo Bookhagen

The global availability of dense point-clouds provides the potential to better assess changes in our dynamic world, particularly environmental changes and natural hazards. A core step to make use of modern point-clouds is to have a reliable classification and identify features of importance for a successful classification. However,  the quality of classification is affected by both the classifier and the complexity of the features which describe the classes. To address the limitations of classification performance, we attempt to answer the question: To what extent can a classifier learn the separation into different classes based on the available features in a given training dataset?

We compare several measures of class separability to assess the descriptive value of each feature. A ranked list is generated that includes all individual features as well as all possible combinations within specific groups. Selecting high-ranked features based on their descriptive value allows us to summarize datasets without losing essential information about the individual classes. This is an important step in processing existing training data or in setting priorities for future data collection.

In our application experiments, we compare geometric and echo-based features of lidar point-clouds to obtain the most useful sets of features for separating ground and vegetation points into their respective classes. Different scenarios of suburban and natural areas are studied to collect various insights for different classification tasks. In addition, we group features based on various attributes such as acquisition or computational cost and evaluate the benefits of these efforts in terms of a possible better classification result.

How to cite: Hess, M., Rheinwalt, A., and Bookhagen, B.: Point-Cloud Class Separability: Identifying the Most Discriminative Features, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12352, https://doi.org/10.5194/egusphere-egu23-12352, 2023.

EGU23-14061 | PICO | ESSI1.8

The Earth Observation Training Data Lab (EOTDL) - addressing training data related needs in the Earth Observation community. 

Patrick Griffiths, Juan Pedro, Gunnar Brandt, Stephan Meissl, Grega Milcinski, and Laura Moreno

The availability of large training datasets (TDS) has enabled much of the innovative use of Machine Learning (ML) and Artificial Intelligence (AI) in fields such as computer vision or language processing. In Earth Observation and geospatial science/applications, the availability of TDS has generally been limited and there are a number of specific geospatial challenges to consider (e.g. spatial reference systems, spatial/spectral/radiometric/temporal resolutions). Creating TDS for EO applications commonly involves labor intensive processes and the willingness to share such datasets has been very limited. While the current open accessibility of EO datasets is unprecedented, the availability of training and ground truth datasets has not improved much over the last years, and this is limiting the potential innovative impact that new ML/AI methodologies could have in the EO domain. Next to general availability and accessibility, further challenges need to be addressed in terms of making TDS interoperable and findable and lowing the barriers for non-geospatial experts.

 

In the response to these challenges, ESA has initiated development of the Earth Observation Training Data Lab (EOTDL). EOTDL is being developed on top of federated European cloud infrastructure and aims to address the EO community requirements for working with TDS in EO workflows, adopting FAIR data principles and following open science best-practices.

The specific capabilities that EOTDL will support include:

  • Repository and Curation: host, import and maintain training datasets, ground truth data, pretrained models and benchmarks, providing versioning, tracking and provenance.
  • Tooling: provide a set of integrated open-source tools compatible with the major ML/AI frameworks to create, analyze and optimize TDS and to support data ingestion, model training and inference operations.
  • Feature engineering: Link with the main EO data archives and EO analytics platforms to support feature engineering and large-scale inference.
  • Quality assurance: embed QA throughout the offered capabilities, also taking advantage of automated deterministic checks and defined levels of TDS maturity.

To achieve these goals, EOTDL is building on proven technologies, such as STAC (Spatio Temporal Asset Catalog) to support data cataloguing and discoverability, openEO and SentinelHub APIs for EO data access and feature engineering, GeoDB for vector geometry and attribute handing, and EoxHub to support interactive tooling. The EOTDL functionality will be exposed via web-based GUIs, python libraries and command line interfaces.  

A central objective is also the incentivization of community engagement to support quality assurance and encourage the contribution of datasets. For this award mechanisms are being established. The initial data population consists of around 100 datasets while intuitive data ingestion pipelines allow for continuous community contributions. Three defined product maturity levels are linked to QA procedures and support the trustworthiness of the data population. The development is coordinated with Radiant ML Hub to seek synergies rather than duplicate the offered capabilities.  

This presentation will showcase the current development status of EOTDL and discuss in detail some key aspects such as the data curation with STAC and the adopted quality assurance and feature engineering approaches. A set of use cases that establish new TDS creation tools and result in large scale datasets are presented as well.

How to cite: Griffiths, P., Pedro, J., Brandt, G., Meissl, S., Milcinski, G., and Moreno, L.: The Earth Observation Training Data Lab (EOTDL) - addressing training data related needs in the Earth Observation community., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14061, https://doi.org/10.5194/egusphere-egu23-14061, 2023.

EGU23-16998 | PICO | ESSI1.8

The OGC Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) Standard 

Peng Yue, Boyi Shangguan, and Danielle Ziebelin

The development of Artificial Intelligence (AI), especially Machine Learning (ML) technology, has injected new vitality into the geospatial domain. Training Data (TD) plays a fundamental role in geospatial AI/ML. They are key items for training, validating, and testing AI/ML models. At present, open access Training Datasets (TDS) are usually packaged into public or personal file repository, without a standardized method to express its metadata and data content, making it difficult to be found, accessed, interoperated, and reused.

Therefore, based on the Open Geospatial Consortium (OGC) standards baseline, the OGC Training Data Markup Language for AI (TrainingDML-AI) Standard Working Group (SWG) tried to develop the TD model and encoding methods to exchange and retrieve TD in the Web environment. The scope includes: how TD are prepared, how to specify different metadata used for different AI/ML tasks, how to differentiate the high-level TD information model and extended information models specific to various AI/ML applications. The work will describe the latest progress and status of the standard development.

The TrainingDML-AI conceptual model includes the most relevant entities of the TD covering from dataset to individual training samples and labels. It specifies how and into which parts of the TD should be decomposed and classified. The core concepts include: AI_TrainingDataset, which represents a collection of training samples; AI_TrainingData, which is an individual training sample in a TDS; AI_Task, which identifies what task the TDS is used for; AI_Label, which represents the label semantics for TD; AI_Labeling, which provides the provenance for the TD; AI_TDChangeset, which records TD changes between two TDS versions; DataQuality, which can be associated with the TDS to document its quality.

The TrainingDML-AI content model focuses on implementations with basic attributes defined for off-the-shelf deployment. Concepts related to the EO AI/ML applications are defined as additional elements. Six key components are highlighted:

  • Training Dataset/Data. AI_AbstractTrainingDataset indicates the TDS, while each training sample is represented as AI_AbstractTrainingData. AI_EOTrainingDataset and AI_EOTrainingData are defined to convey attributes specific to EO domain.
  • AI_EOTask is proposed by extending AI_AbstractTask to represent specific AI/ML tasks in the EO domain. The task type can refer to a particular type defined by an external category.
  • Labels for each individual training sample can be represented using features, coverages, or semantic classes. The AI_AbstractLabel is extended to specify AI_SceneLabel, AI_ObjectLabel, and AI_PixelLabel respectively.
  • AI_Labeling records basic provenance information on how to create the TDS. It includes the labeler and labeling procedure, which can be mapped to the agent and activity respectively in W3C PROV.
  • DataQuality and QualityElements defined in the ISO 19157-1 are used to align with the existing efforts on geographic data quality.
  • Change procedures of the TDS are documented in the AI_TDChangeset, which composes of changed training samples in the collection level.

Finally, use case scenarios and best practices are provided to illustrate intended use and benefits of TrainingDML-AI for EO AI/ML applications. Totally five different tasks are provided, covering scene classification, object detection, semantic segmentation, change detection and 3D model reconstruction. Some software implementations including pyTDML and LuojiaSet are also presented.

How to cite: Yue, P., Shangguan, B., and Ziebelin, D.: The OGC Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) Standard, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16998, https://doi.org/10.5194/egusphere-egu23-16998, 2023.

EGU23-17570 | PICO | ESSI1.8

A dataset of Earth Observation Data for Lithological Mapping using Machine Learning 

Ioannis Vernikos, Georgios Giannopoulos, Aikaterini Christopoulou, Anxhelo Begaj, Marianthi Stefouli, Emmanuel Bratsolis, and Eleni Charou

Machine Learning (ML) algorithms had successfully contributed in the creation of automated methods of recognizing patterns in high-dimensional data. Remote sensing data  covers  wide  geographical areas and could be used to solve the problem of the demand of various  in-situ data.  Lithologicall mapping using remotely sensed data  is one of the most challenging  applications of ML algorithms. In the framework of the “AI for Geoapplications” project , ML and especially Deep Learning (DL) methodologies are investigated  for  the identification and characterization of the lithology based on remote sensing data in various  pilot areas  in Greece.  In order to train and test the various ML algorithms, a dataset consisting of  30 ROIs selected  mainly  from low -vegetated areas,  that cover 2% of the total  area of Greece was created . For each  ROI 

  • the corresponding shape file  with the lithological units
  • the corresponding  Sentinel2 (10 bands)  and/or Aster (14 bands) images

are provided

The dataset is  being publicly  available in the cloud along with the necessary code for visualization and processing.

How to cite: Vernikos, I., Giannopoulos, G., Christopoulou, A., Begaj, A., Stefouli, M., Bratsolis, E., and Charou, E.: A dataset of Earth Observation Data for Lithological Mapping using Machine Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17570, https://doi.org/10.5194/egusphere-egu23-17570, 2023.

EGU23-3441 | Posters on site | ESSI1.9

A Programming Model for Geospatial Machine-Learning with Scalability in Hybrid Multiclouds 

Michiaki Tatsubori, Daiki Kimura, Takao Moriyama, Naomi Simumba, and Tatsuya Ishikawa

While deep machine learning approaches are getting pervasively used in remote sensing and modeling the earth, difficulties due to the size of satellite data are always pains for scientists in implementing such experiential software. We present a programming model for geospatial machine learning based on TorchGeo and PyTorch, which are getting the de fact standards in programming with PyTorch/Python.  TorchGeo is open-sourced and designed to make it simple for remote sensing experts to explore machine learning solutions. Our objective is to allow machine-learning programs using TorchGeo to scale leveraging proprietary high-performance computing (HPC) and multicloud HPC resources, from ones notebook. One of key technologies specifically needed in geospatial machine learning is the smart integration of peta-scale data services and data-distributed parallel frameworks.  We implement such a platform as a part of IBM Research Geospatial Discovery Network (GDN) and experiment segmentation tasks such as flood detection from satellite data to show its scalability.

How to cite: Tatsubori, M., Kimura, D., Moriyama, T., Simumba, N., and Ishikawa, T.: A Programming Model for Geospatial Machine-Learning with Scalability in Hybrid Multiclouds, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3441, https://doi.org/10.5194/egusphere-egu23-3441, 2023.

EGU23-3501 | ECS | Posters on site | ESSI1.9

EarthNets: An Open Deep Learning Platform for Earth Observation 

Zhitong Xiong and Xiao Xiang Zhu

Earth observation (EO) data are critical for monitoring the state of planet Earth and can be helpful for various real-world applications [1]. Although numerous benchmark datasets have been released, there is no unified platform for developing and fairly comparing deep learning models on EO data [2]. For deep learning methods, the backbone networks, hyper-parameters, and training details are influential factors while comparing the performances.. However, existing works usually neglect these details and even evaluate the performance with different training/validation/test dataset splits. This makes it difficult to fairly and reliably compare different algorithms. In this study, we introduce the EarthNets platform, an open deep-learning platform for remote sensing and Earth observation. The platform is based on PyTorch [3] and TorchData. There are about ten different libraries, covering different tasks in remote sensing. Among them, Dataset4EO is designed as a standard and easy-to-use data-loading library, which can be used alone or together with other high-level libraries like RSI-Classification (for image classification), RSI-Detection (for object detection), RSI-Segmentation (for semantic segmentation), and so on. Two factors are considered for the design of the EarthNets platform: the first one is the decoupling between dataset loading and high-level EO tasks. As there are more than 400 RS datasets with different data modalities, research domains, and download links, efficient preparation of analysis-ready data can largely accelerate the research for the whole community. The other factor is to bring advances in machine learning to EO by providing new deep-learning models. The EarthNets platform provides a fair and consistent evaluation of deep learning methods on remote sensing and Earth observation data [4]. It also helps bring together the remote sensing and a larger machine-learning community. The platform, dataset collections are publicly available at https://earthnets.github.io.

[1] Zhu, Xiao Xiang, et al. "Deep learning in remote sensing: A comprehensive review and list of resources." IEEE Geoscience and Remote Sensing Magazine 5.4 (2017): 8-36.

[2]Long, Yang, et al. "On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-aid." IEEE Journal of selected topics in applied earth observations and remote sensing 14 (2021): 4205-4230.

[3] Paszke, Adam, et al. "Pytorch: An imperative style, high-performance deep learning library." Advances in neural information processing systems 32 (2019).

[4] Xiong, Zhitong, et al. "EarthNets: Empowering AI in Earth observation." arXiv preprint arXiv:2210.04936 (2022).

How to cite: Xiong, Z. and Zhu, X. X.: EarthNets: An Open Deep Learning Platform for Earth Observation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3501, https://doi.org/10.5194/egusphere-egu23-3501, 2023.

EGU23-4160 | Posters on site | ESSI1.9

Sentinel Hub - federated on-demand ARD generation 

Grega Milcinski and Primoz Kolaric

Every experiment starts with the data, which needs to be fine-tuned for the specific use-case. We call this "analysis ready data (ARD)". In some cases, for the sake of reusability and comparability, the specifications for ARD are well defined. In many other cases, however, the procedures are not yet mature enough to support standardisation. In Earth Observation (EO) field this is especially true, as the whole community is moving from (semi)manually analysing individual scenes, from the time there were any data barely available, to processing of time-series, now that Landsat and Sentinel made this possible. We are now even facing a problem where there is simply too much of data, with PBs of open and commercial imagery being readily available. With the data being distributed at different places (Copernicus Data Access Service for Sentinel, AWS for Landsat) the challenge is further magnified. Machine learning (ML) approach can address the challenge of shifting through data, but ML as well requires data to be pre-processed for purpose and made available at the place where ML is running. Therefore, it is essential to have facility, which can generate ARD data customised for the specific analysis' requirements.

Sentinel Hub (SH) is a satellite imagery processing service, which is capable of on-the-fly gridding, re-projection, re-scaling, mosaicking, compositing, orthorectification and other actions required, either for integration in web-applications, where pictures are mostly served, or in ML and similar analysis processes, where pixel values and statistics are essential. SH works with original satellite data files and does not require replication or pre-processing.  It uses cloud infrastructure and innovative methods to efficiently process and distribute data in a matter of seconds. Sentinel Hub gives access to a rich collection of satellite data including a full set of Sentinel satellites, Landsat collections, commercial VHR collections and other complimentary collections. It also provides an ability for users to onboard their own data in one of the standardised formats. Furthermore, the data located at different clouds, can be fused together in one single process, benefiting from the variability and volume of different sensors.

There are two main capabilities, which make SH especially fit for purpose of generating on-demand ARD data. First one is the support for user-provided processing scripts, which are a set of recipes on what should happen with the sensor data (band composites, indices, even simple neural networks combining available data). The second one is a set of processing orchestration options. There is a Process API for immediate, access to the pixel values. Statistical API is optimised for time-series analysis, which aggregates the data over specific area of interest and provides configurable statistics through time. And then there are asynchronous siblings of these services, which are fine-tuned for large scale processing - if one wants to prepare ML features for the entire continent or get time-series for millions of agriculture parcels.

We will present the technology behind the scenes, making the processing possible, as well as several use-cases, how one can efficiently make use the service in ML.

How to cite: Milcinski, G. and Kolaric, P.: Sentinel Hub - federated on-demand ARD generation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4160, https://doi.org/10.5194/egusphere-egu23-4160, 2023.

EGU23-7233 | Posters on site | ESSI1.9

A flexible, scalable, cloud-native framework for geospatial modelling 

Blair Edwards, Paolo Fraccaro, Nikola Stoyanov, Anne Jones, Junaid Butt, Julian Kuehnert, Andrew Taylor, and Bhargav Garikipati

Understanding and quantifying the risk of the physical impacts of climate change and their subsequent consequences have crucial importance in the changing climate for both businesses and society more widely. Historically, modelling workflows to assess such impacts have been bespoke and constrained by the data they can consume, the compute infrastructure, the expertise required to run them and the specific ways they are configured. Here we present, a cloud-native modelling framework for running geospatial models in a flexible, scalable, configurable, user-friendly manner. This enables models (physical or ML/AI) to be rapidly onboarded and composed into workflows. These workflows can be flexible, dynamic and extendable, running as for historical events, or as forecast ensembles, with varying data inputs, or extended to model impact in the real world (e.g. for example to infrastructure and populations). The framework supports the streamlined training and deployment of AI models, which can be seamlessly integrated with physical models to create hybrid workflows. We demonstrate the application and features of the framework for the examples of flooding and wildfire.

How to cite: Edwards, B., Fraccaro, P., Stoyanov, N., Jones, A., Butt, J., Kuehnert, J., Taylor, A., and Garikipati, B.: A flexible, scalable, cloud-native framework for geospatial modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7233, https://doi.org/10.5194/egusphere-egu23-7233, 2023.

EGU23-11709 | Posters on site | ESSI1.9

Microservice architecture to enable fast assessment of the NatCat events based on EO and geospatial data 

Karolina Sarna, Johannes Hiekkasaari, and Joni Taajamo

Fast response to natural catastrophe events is crucial in our fast changing world. Creating comprehensible solutions based on the Earth Observation (EO) and geospatial data is complex and requires combining multiple data sources and maintaining high level of configuration parameters.

In this talk we discuss the application of microservices architecture to tackle some of the issues inherent to building products based on EO and geospatial data. We will present how decomposing sophisticated algorithms into small services can help with continuous delivery, scaling and deployment of large, complex applications that can be reused for various products. This architecture enables reproducibility of analysis which is a crucial component for applying machine learning and automation into any EO based product. We will also address the additional complexity of creating a distributed system as well as high dependency on data consistency and availability.

How to cite: Sarna, K., Hiekkasaari, J., and Taajamo, J.: Microservice architecture to enable fast assessment of the NatCat events based on EO and geospatial data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11709, https://doi.org/10.5194/egusphere-egu23-11709, 2023.

EGU23-13481 | Posters on site | ESSI1.9

Satellite data as a predictor for monitoring tree health and stress in reforestation projects 

H. Gijs J. Van den Dool and Deepali Bidwai

In many parts of the world, reforestation is an ongoing activity, but due to the deforestation processes (e.g. the change in soil conditions, agricultural expansion, and infrastructure expansion such as urbanisation or road building), the success rate of replanting is far from sure; therefore, it is essential to:

  • have a good idea of the pre-planting conditions at the location,
  • monitor the growth,
  • improve the growing conditions whenever possible, and
  • adapt the site selection criteria

In the proposed method, it is not possible to change the site selections of already planted locations, but it is possible to monitor the selected location and check under which conditions the trees are growing best. 

Several data sources are identified to predict plant health and stress, first to establish a baseline and, from this baseline, project into the future (short and mid-term). We compute the main vegetation index (NDVI) from the high-resolution image data provided by Planet (through the NICFI Basemaps for Tropical Forest Monitoring program). The historical NDVI values are obtained from the Sentinel 2 (and potentially LandSat) data at lower resolutions. Environmental conditions are added to the stress index by extracting the relevant meteorological parameters from the ERA5 database (temperature and precipitation) to compute the drought indices (e.g. KBDI/SPI/SPIE) and water availability (AWC) with the dominant soil type, supplemented with supporting indices from the satellite data (e.g. NDWI/SAVI/EVI-2).

For reforestation projects, it is vital to monitor the impact of environmental parameters on plant health and stress, and to assist with the forest maintenance of the sites, we built time series models for temperature, precipitation, and various vegetation indices to create a baseline for site-specific growing conditions. Deep Learning (DL) models like semantic segmentation based on Convolutional Neural Network (CNN) can be built on top of it using transfer learning to extract the features from pre-trained models using large (global) datasets. The model can not only predict tree health but can also be used to predict growing conditions in the near future by flagging out potential dry periods before they happen.

The high-resolution remote sensed products are available in the (sub)tropical zone [30N - 30S], while the lower resolution products and the ERA5 data have a global cover. The test sites in this study are example sites, but the developed method can be applied to any reforestation monitoring project. The result of the analysis is a near-term growth indicator, which can be used to adjust the growing conditions of the site, as well as assist with the site selection for new reforestation projects (based on the established baseline and predictions).

The next step, after validation, is to create a dashboard where the user can select any location (within the data domain) and construct the baseline and prediction, based on available information.

How to cite: Van den Dool, H. G. J. and Bidwai, D.: Satellite data as a predictor for monitoring tree health and stress in reforestation projects, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13481, https://doi.org/10.5194/egusphere-egu23-13481, 2023.

EGU23-13852 | Posters on site | ESSI1.9

Can machine learning help us to create improved and trustworthy satellite-based precipitation products? 

Ioannis Tsoukalas, Panagiotis Kossieris, Luca Brocca, Silvia Barbetta, Hamidreza Mosaffa, and Christos Makropoulos

Key variable of earth observation (EO) systems is precipitation, as indicated by the wide spectrum of applications that is involved (e.g., water resources and early warning systems for flood/drought events). During the last decade, the EO community has put significant research efforts towards the development of satellite-based precipitation products (SPPs), however, their deployment in real-world applications has not yet reached the full potential, despite their ever-growing availability, spatiotemporal coverage and resolution. This may be associated with the reluctancy of end-users to employ SPPs, either worrying about uncertainty and biases inherited in SPPs or even due to the existence of multiple SPPs, whose performance fluctuates across the globe, and thus making it difficult to select the most appropriate SPP (some sort of a choice paradox). To address this issue, this work targets the development of an explainable machine learning approach capable of integrating multiple satellite-based precipitation (P) and soil moisture (SM) products into a single precipitation product. Hence, in principle, to create a new dataset that optimally combines the properties of each individual satellite dataset (used as predictors), better matching the ground-based observations (used as predictand, i.e., reference dataset). The proposed approach is showcased via a benchmark dataset consisted of 1009 cells/locations around the world (Europe, USA, Australia and India), highlighting its robustness as well as its application capability which are independent of specific climatic regimes and local peculiarities.

How to cite: Tsoukalas, I., Kossieris, P., Brocca, L., Barbetta, S., Mosaffa, H., and Makropoulos, C.: Can machine learning help us to create improved and trustworthy satellite-based precipitation products?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13852, https://doi.org/10.5194/egusphere-egu23-13852, 2023.

EGU23-14353 | Posters on site | ESSI1.9

MLOps in practice: how to scale your geospatial practice with cloud-based shared MLOps platform 

Frank de Morsier and Julien Rebetez

The key to drive innovation in EO science and applications, boosting geospatial mass adoption and in turn ‘geo-enabling’ companies, researchers and institutions, is moving away from complex, inefficient, and expensive workflows and making fundamental changes in ML practices. This is where geospatial MLOps, and platforms such as Picterra play a crucial role: cloud-native, shared platforms offer user friendly and efficient interfaces, smart toolkit and features paired with auto-scaling infrastructure and state of the art deep learning architecture. They allow to create and operate geospatial ML models at scale, enabling organizations to complete geospatial ML projects faster than ever before.

MLOps platforms systemize the process of building and training experimental machine learning models as well as translating them into production. This workflow efficiency empowers teams working with massive datasets, and allows organizations to leverage data analytics for decision-making and building better customer experiences.

Achieving productivity and speed requires streamlining and automating processes, as well as building reusable assets that can be managed closely for quality and risk. When significant model drift is detected, the ability to retrain and redeploy ML models in an automated fashion is crucial to ensure business continuity.

Shared platform, managed infrastructure, and integrable architecture results in streamlined pipelines and straightforward integration. This agility reduces the time to value and frees up time to serve more use cases, leading to increased value to the business. Companies implementing geospatial MLOps can speed up model training times, dramatically improve accuracy, and go from an idea to a live solution in just days – without increasing headcount or technical debt. Over time, they will also collect a library of strategic ML assets that will enable them to act on timely data - fast.

Using Picterra as a prime example of geospatial ML platform built with MLOps processes in its core, we will dive into how it facilitatesthe key steps of ML workflows incl: 

  • Direct access to a diverse range of satellite imagery sources via the platform ie. Sentinel-1/2, Planetscope, open aerial imagery campaigns, ingesting WMS//XYZ server streams.
  • Compatibility with any geospatial imagery sources (e.g. Optical, SAR, hyperspectral, thermal infrared, etc.) and possibility to connect to data cloud storage or directly upload via web interface, besides the above mentioned images servers.
  • A unequalled MLOps interface to prototype the extraction of new information from imagery around any custom defined use case ie. biodiversity monitoring, crops mapping and classification, assets management and many more. Trained model are directly served and made available for inference at large scale.
  • Extensive toolset on explainable & interpretable AI which is bringing robustness & efficiency in creating geospatial Machine Learning models for example dataset exploration 
  • Fast turnaround time in creating and validating Machine Learning models to save time and resources, thanks to the auto-scaling infrastructure leveraging Kubernetes and an intuitive interface for fast prototyping.
  • A unique set of advanced GIS pre/post-processing tools to manage imagery and the geospatial outputs extracted.
  • A complete API interface and Python library to further integrate with existing workflows or softwares (e.g. ESRI ArcGIS, Safe FME, etc.)

How to cite: de Morsier, F. and Rebetez, J.: MLOps in practice: how to scale your geospatial practice with cloud-based shared MLOps platform, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14353, https://doi.org/10.5194/egusphere-egu23-14353, 2023.

EGU23-16802 | Posters on site | ESSI1.9

eo-grow - Earth Observation framework for scaled-up processing in Python 

Matej Batič, Žiga Lukšič, and Grega Milcinski

Analysing EO data is a complex process, and solutions often require custom tailored algorithms. On top of that, in the EO domain most problems come with an additional challenge: How can the solution be applied on a large scale?

Within the H2020 project Global Earth Monitor (GEM) we have updated and extended eo-learn with additional functionalities that allow for new approaches to scalable and cost-effective Earth Observation data processing. We have tied it with the Sentinel Hub’s unified main data interface (Process API), the Data Cube processing engine for constructing analysis-ready adjustable data cubes using Batch Process API, and, finally, the Statistical API and Batch Statistical API to streamline access to spatio-temporally aggregated satellite data.

As part of GEM processing framework, we have built eo-grow which facilitates extraction of valuable information from satellite imagery. eo-grow tackles the issues of scalability by enabling coordination of clusters to run the EO workflows over large areas using  Ray. At the same time the framework provides reproducibility and traceability of the experiments using schemed input configurations and their validation.

In eo-grow a workflow based solution is wrapped into a pipeline object, which takes care of parametrization, logging, storage, multi-processing, data management and more. The pipeline object is configured via a well-defined schema allowing straightforward experimentation and scaling up: going to larger area of interest, running on different time interval, or tweak of any other pipeline parameter becomes just a matter of updating (json) configuration, which additionally serve as record of the experiment.

eo-grow library has been publicly released on GitHub: https://github.com/sentinel-hub/eo-grow. The documentation available in the repository provides the overview of the eo-grow general structure, its core objects, and instructions on installation and using eo-grow with command line interface. Additional repository, https://github.com/sentinel-hub/eo-grow-examples showcases eo-grow on a few use-cases.

In the presentation we will introduce the framework and showcase its usability on concrete examples. We will illustrate how eo-grow is used in large-scale research experiments, explain its role in reproducibility and show how the no-code approach and code reuse facilitate the productionalization of the workflows.

How to cite: Batič, M., Lukšič, Ž., and Milcinski, G.: eo-grow - Earth Observation framework for scaled-up processing in Python, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16802, https://doi.org/10.5194/egusphere-egu23-16802, 2023.

EGU23-17046 | ECS | Posters on site | ESSI1.9

Wayang AgoraEO Plugin: The Framework for Scalable EO Workflows 

Rodrigo Pardo Meza, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, and Volker Markl

Wayang AgoraEO Plugin: The Framework for Scalable EO Workflows

Currently, Earth Observation (EO) platforms provide datasets, algorithms, and processing capabilities. Nevertheless, each platform proposes its own exclusive habitat to discover, process, and run EO elements. We recently proposed AgoraEO [2], a decentralized, open, and unified ecosystem, where users can find EO elements, compose cross-platform EO pipelines, and execute them efficiently. With this ambition of supporting cross-platform federated analytics, Agora EO relies on Apache Wayang [1] as its main analytical processing platform. Within AgoraEO, we are developing and enabling Apache Wayang with EO features, exposing the internals of BigEarthNet [2] to the Earth Observation community. Here we present our Wayang AgoraEO plugin that follows the BigEarthNet workflow to achieve all its benefits in a scalable and parameterizable (reusable) way. The Wayang AgoraEO plugin empowers users to create EO workflows, using any EO platform in a simple way: using operators and an intuitive API that follows the behaviors of the EO platforms it exploits. The execution of sub-tasks is controlled but isolated in any required data processing system in tandem with the rest of the platform. In addition, one can fetch datasets from several independent sources. By design, Apache Wayang works as a declarative framework for ML: Users specify ML tasks at a high level, using the most convenient API to write a workflow (Java-Scala, Python, and Postgres are supported). Wayang then models an ML task as a mathematical optimization problem and uses its gradient descent-based optimizer to invoke the appropriate physical algorithms and system configurations to execute a given ML task. Therefore, decoupling user specification of ML tasks from its execution. We believe the Wayang AgoraEO plugin can be a game changer in the tedious task of implementing and deploying EO workflows within EO platforms today: It makes it easy to reuse resources and share them. Likewise, it is an easily extensible solution to include new operators that can include new EO platforms and tasks. As a result, this solution can be a great leap in the democratization of EO technologies, contributing to their integration, scalability, and access to high-performance computing.

References

[1] S. Kruse, Z. Kaoudi, J. -A. Quiane-Ruiz, S. Chawla, F. Naumann and B. Contreras-Rojas, "Optimizing Cross-Platform Data Movement," IEEE 35th International Conference on Data Engineering, 2019, pp. 1642-1645.

[2] A. Wall, B. Deiseroth, E. Tzirita Zacharatou, J-A, Quiané-Ruiz, B. Demir, V. Markl, "AGORA-EO: A Unified Ecosystem for Earth Observation - A Vision For Boosting EO Data Literacy," Big Data from Space Conference, 2021.

How to cite: Pardo Meza, R., Quiané-Ruiz, J.-A., Demir, B., and Markl, V.: Wayang AgoraEO Plugin: The Framework for Scalable EO Workflows, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17046, https://doi.org/10.5194/egusphere-egu23-17046, 2023.

An LSTM-based distributed hydrologic model for an urban watershed of Korea was developed. The input of the model is the time series of the 10-minute radar-gauge composite rainfall data and 10-minute temperature data at the 239 model grid cells, and the output of the model is the 10-minute flow discharge at the watershed outlet. The Nash-Sutcliffe Efficiency (NSE) coefficients of the calibration period (2013-2016) and validation period (2017-2019) were 0.99 and 0.67, respectively. Normal events were better predicted than the extreme ones. Further in-depth analyses revealed that: (1) the model composes the watershed outlet flow discharge by linearly superimposing multiple time series created by each of the LSTM units. Unlike conventional hydrologic models, most of these time series greatly fluctuated in both positive and negative domain; (2) the runoff to rainfall ratio of each of the model grid cells does not reflect its counterpart parameters of the conceptual hydrologic models  revealing that the model simulates the watershed responses in a unique manner; (3) the model successfully reproduced the soil-moisture dependent runoff processes, which is an essential prerequisite of continuous hydrologic models; (4) Each of the LSTM units have different temporal sensitivity to a unit rainfall stimulus, and the LSTM units that is sensitive to rainfall input have greater output weight factors nearby the watershed outlet, and vice versa. This means that the model learned a mechanism to separately consider the hydrologic components with distinct response time such as direct runoff and the low frequency baseflow. 

Acknowledgement

This research was supported by the Basic Science Research Program (Grant Number: 2021R1A2C2003471) and the Basic Research Laboratory Program (Grant Number: 2022R1A4A3032838) through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT.

How to cite: Kim, D. and Lee, Y.: Machines simulate hydrologic processes using a simple structure but in a unique manner – a case study of predicting fine scale watershed response on a distributed framework, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-245, https://doi.org/10.5194/egusphere-egu23-245, 2023.

This study developed a distributed hydrologic model based on Long Short-Term Memory (LSTM) to predict flow discharge of Joongrang stream located in a highly urbanized area in Seoul, Korea. The model inputs are the time series of 10-minute radar-gauge composite precipitation data at 239 grid cells (1km2) in the watershed and the Normalized Difference Vegetation Index (NDVI) data derived from Landsat 8 images and the model output is the 10-minute flow discharge at the watershed outlet as output. The model was trained for the calibration period of 2013-2016 and was validated for the period of 2017-2019. The NSE value over the validation period corresponding to the optimal model architecture (256 LSTM hidden layers) with and without NDVI input data was 0.68 and 0.52, respectively, which suggests that the machine can learn dynamic processes of soil infiltration and plant interception from the remotely sensed information provided by satellite.

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2022R1A4A3032838). 

How to cite: Lee, J. and Kim, D.: Effectiveness of Satellite-based Vegetation Index for Simulating Watershed Response Using an LSTM-based model in a Distributed Framework, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-339, https://doi.org/10.5194/egusphere-egu23-339, 2023.

EGU23-1218 | Posters on site | HS3.3

Exploring the Value of Natural Language Processing for Urban Water Research 

Ina Vertommen, Xin Tian, Tessa Pronk, Siddharth Seshan, Sotirios Paraskevopoulos, and Bas Wols

Natural Language Processing (NLP), empowered by the most recent developments in Deep Learning, demonstrates its potential effectiveness for handling texts. Urban water research  benefits from both subfields of NLP, namely, Natural Language Understanding (NLU) and Natural Language Generation (NLG). In this work, we present three recent studies that use NLP for: (1) automated processing and responding to registered customer complaint within Dutch water utilities, (2) automated collection of up-to-date water-related information from the Internet, (3) extraction of key information about chemical compounds and pathogen characteristics from scientific publications. These applications, using the latest NLP models and tools (e.g., Rasa, Spacy), take into account studies on both water quality and quantity for the water sector. According to our findings, NLU and rule-based text mining are effective in extracting information from unstructured texts. In addition, NLU and NLG can be integrated to build a human-computer interface, such as a value-based Chabot to understand and address the demands made by customers of water utilities.

How to cite: Vertommen, I., Tian, X., Pronk, T., Seshan, S., Paraskevopoulos, S., and Wols, B.: Exploring the Value of Natural Language Processing for Urban Water Research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1218, https://doi.org/10.5194/egusphere-egu23-1218, 2023.

EGU23-1278 | ECS | Orals | HS3.3

Evaluating Machine Learning Approach for Regional Flood Frequency Analysis in Data-sparse Regions 

Nikunj K. Mangukiya and Ashutosh Sharma

Accurate flood frequency analysis is essential for developing effective flood management strategies and designing flood protection infrastructure, but it is challenging due to the complex, nonlinear hydrological system. In regional flood frequency analysis (RFFA), the flood quantiles at ungauged sites can be estimated by establishing a relationship between interdependent physio-meteorological variables and observed flood quantiles at gauge sites in the region. However, this regional approach implies a loss of information due to the prior aggregation of hydrological data at gauged locations and can be difficult for data-sparse regions due to limited data. In this study, we evaluated an alternate approach or path for RFFA in two case studies: a data-sparse region in India and a data-dense region in the USA. In this approach, daily streamflow is predicted first using a deep learning-based hydrological model, and then flood quantiles are estimated from the predicted daily streamflow using statistical methods. We compared the results obtained using this alternate approach to those from the traditional RFFA technique, which used the Random Forest (RF) and eXtreme Gradient Boosting (XGB) algorithms to model the nonlinear relationship between flood quantiles and relevant physio-meteorological predictor variables such as meteorological forcings, topography, land use, and soil properties. The results showed that the alternate approach produces more reliable results with the least mean absolute error and higher coefficient of determination in the data-sparse region. In the data-dense region, both traditional and alternate approaches produced comparable results. However, the alternate approach has the advantage of being flexible and providing the complete time series of daily flow at the ungauged location, which can be used to estimate other flow characteristics, develop flow duration curves, or estimate flood quantiles of any return period without creating a separate traditional RFFA model. This study shows that the alternate approach can provide accurate flood frequency estimates in data-sparse regions, offering a promising solution for flood management in these areas.

How to cite: Mangukiya, N. K. and Sharma, A.: Evaluating Machine Learning Approach for Regional Flood Frequency Analysis in Data-sparse Regions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1278, https://doi.org/10.5194/egusphere-egu23-1278, 2023.

EGU23-1526 | ECS | Orals | HS3.3

Extrapo… what? Predictions beyond the support of the training data 

Ralf Loritz and Hoshin Gupta

Neural networks belong to the best available methods for numerous hydrological model challenges. However, although they have shown to outperform classical hydrological models in several applications there is still some doubt whether neural networks are, despite their excellent interpolation skills, capable to make predictions beyond the support of the training data. This study addresses this issue and proposes an approach to infer the ability of neural network to predict unusual, extreme system states. We show how we can use the concept of data surprise and model surprise in a complementary manner to assess which unusual events a neural network can predict, which it can predict but only with additionally data and which it cannot predict at all hinting toward the wrong model choice or towards an incomplete description of the data.

How to cite: Loritz, R. and Gupta, H.: Extrapo… what? Predictions beyond the support of the training data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1526, https://doi.org/10.5194/egusphere-egu23-1526, 2023.

Having a continuous and complete karst discharge data record is necessary to understand hydrological behaviors of the karst aquifer and manage karst water resources. However, caused by many problems such as equipment errors and failure of observation, lots of hydrological and research dataset contains missing spring discharge values, which becomes a main barrier for further environmental and hydrological modeling and studies. In this work, a novel approach that integrates deep learning algorithms and ensemble empirical mode decomposition (EEMD) is developed to reconstruct missing karst spring discharge values with the local precipitation. EEMD is firstly employed to decompose the precipitation data, extract useful features, and remove noises. The decomposed precipitation components are then fed as input data to various deep learning models for performance comparison, including convolutional neural network (CNN), long short-term memory (LSTM), and hybrid CNN-LSTM models to reconstruct the missing discharge values. Root mean squared error (RMSE) and Nash–Sutcliffe efficiency coefficient (NSE), are calculated to evaluate the reconstruction performance as metrics. The models are validated with the spring discharge and precipitation data collected at Barton Spring in Texas. The reconstruction performance of various deep learning models with and without EEMD are compared and evaluated. The main conclusions can be summarized as: 1) by using EEMD, the integrated deep models significantly improve reconstruction performance and outperform the simple deep models; 2) among three integrated models, the LSTM-EEMD model obtains the best reconstruction results among three deep learning algorithms; 3) For models with monthly data, the reconstruction performance decreases greatly with the increase of missing rate: the best reconstruction results are obtained when the missing rate is low. If the missing rate was 50%, the reconstruction results become notably poorer. For models with daily data, the reconstruction performance is less impacted by the missing rate and the models can obtain satisfactory reconstruction results when missing rates range from 10% to 50%.

How to cite: Zhou, R. and Zhang, Y.: Reconstruct karst spring discharge data with hybrid deep learning models and ensemble empirical mode decomposition method, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2382, https://doi.org/10.5194/egusphere-egu23-2382, 2023.

Machine Learning and Deep Learning have been proving their potential for streamflow modelling in various studies. In particular, long short-term memory (LSTM) models showed exceptionally good results. However, machine learning models often are considered “black boxes” with limited interpretability. Explainable artificial intelligence (XAI) comprise methods that analyze the internal processes of the machine learning network and allow to have a glance in the “black box”. Most proposed XAI techniques are designed for the analysis of images, and there is currently only limited work on time series data available.

In our study, we applied various XAI algorithms including gradient-based methods (Saliency, InputXGradient, Integrated Gradient, GradientSHAP) but also perturbation-based methods (Feature Ablation, Feature Permutation) to compare their applicability for reasonable interpretation in the hydrological context. To our knowledge, only Integrated Gradient has been applied to a LSTM in hydrology so far. Gradient-based methods analyze the gradient of the output with respect to the input feature. Whereas perturbation-based methods gain information by altering or masking specific input features. The different methods were applied to a LSTM trained for the low-land Ems catchment in Germany, which has a major baseflow share of total streamflow.

We analyzed the results regarding their “timestep of influence”, which describes the amount of past days having importance for the prediction of streamflow at a particular day. All of the algorithms applied result in a comparable annual pattern, characterized by relatively small timesteps of influence in spring (wet season) and increasing timesteps of influence in summer and autumn (dry season). However, the range of the absolute days of attribution varies between the methods. In conclusion, all methods produces reasonable results and appear to be suitable for interpretation purposes.

Furthermore, we compare the results to ERA-5 reanalysis data and gained evidence that the LSTM recognizes soil water storage as the main driver for streamflow generation in the catchment: we found an inverse seasonality of soil moisture and timestep of influence.

How to cite: Ley, A., Bormann, H., and Casper, M.: Exploring different explainable artificial intelligence algorithms applied to a LSTM for streamflow modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3125, https://doi.org/10.5194/egusphere-egu23-3125, 2023.

EGU23-4137 | ECS | Orals | HS3.3

Sequential optimization of temperature measurements to estimate groundwater-surface water interactions 

Robin Thibaut, Ty Ferré, Eric Laloy, and Thomas Hermans
The groundwater-surface water (GW-SW) exchange fluxes are driven by a complex interplay of subsurface processes and their interactions with surface hydrology, which have a significant impact on the water and contaminant exchanges. Due to the complexity of these systems, the accurate estimation of GW-SW fluxes is important for quantitative hydrological studies and should be based on relevant data and careful experimental design. Therefore, the effective design of monitoring networks that can identify relevant subsurface information are essential for the optimal protection of our water resources. In this study, we present novel deep learning (DL)-driven approaches for sequential and static Bayesian optimal experimental design (BOED) in the subsurface, with the goal of estimating the GW-SW exchange fluxes from a set of temperature measurements. We apply probabilistic Bayesian neural networks (PBNN) to conditional density estimation (CDE) within a BOED framework, and the predictive performance of the PBNN-based CDE model is evaluated by a custom objective function based on the Kullback-Leibler divergence to determine optimal temperature sensor locations utilizing the information gain provided by the measurements. This evaluation is used to determine the optimal sequential sampling strategy for estimating GW-SW exchange fluxes in the 1D case, and the results are compared to the static optimal sampling strategy for a 3D conceptual riverbed-aquifer model based on a real case study. Our results indicate that probabilistic DL is an effective method for estimating GW-SW fluxes from temperature data and designing efficient monitoring networks. Our proposed framework can be applied to other cases involving surface or subsurface monitoring and experimental design.

How to cite: Thibaut, R., Ferré, T., Laloy, E., and Hermans, T.: Sequential optimization of temperature measurements to estimate groundwater-surface water interactions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4137, https://doi.org/10.5194/egusphere-egu23-4137, 2023.

Rainfall-runoff (RR) modeling remains a challenging task in the field of hydrology especially when it comes to regional scale hydrology. Recently, the Long Short-Term Memory (LSTM) - which is known for its ability to learn sequential and temporal relations - has been widely adopted in RR modeling. The Convolutional Neural Networks (CNN) have matured enough in computer vision tasks, and trials were conducted to use them in hydrological applications. Different combinations of CNN and LSTM have proved to work; however, questions remain about suitability of different model architectures, the input variables needed for the model and the interpretability of the learning process of the models for regional scale.

 

In this work we trained a sequential CNN-LSTM deep learning architecture to predict daily streamflow between 1980 and 2014, regionally and simultaneously, over 86 catchments from CAMELS dataset in the US. The model was forced using year-long spatially distributed (gridded) input with precipitation, maximum temperature and minimum temperature for each day, to predict one day streamflow. The model takes advantage of the CNN to encode the spatial patterns in the input tensor, and feed them to the LSTM for learning the temporal relations between them. The trained model was further fine-tuned to predict for 3 local sub-clusters of the 86 stations. This was made in order to test the significance of fine-tuning in the performance and model learning process. Also, to interpret the spatial patterns learning process, a perturbation was introduced in the gridded input data and the sensitivity of the model output to the perturbation was shown in spatial heat maps. Finally, to evaluate the performance of the model, different benchmark models were trained using -as possible- a similar training setup as for the CNN-LSTM model. These models are CNN without the LSTM part (regional model), LSTM without CNN part (regional model), simple single-layer ANN (regional model), and LSTM trained for individual stations (considered as state of the art). All of these benchmark models have been fined-tuned for the 3 clusters as well.

 

CNN-LSTM model, after being fine-tuned, performed well predicting daily streamflow over the test period with a median Nash-Sutcliffe efficiency (NSE) of 0.62 and 65% of the 86 stations with NSE > 0.6 outperforming all benchmark models that were trained regionally using the same training setup. The model also achieved a comparable performance as for the -state of the art- LSTM trained for individual stations. Fine-tuning improved the performance for all of the models during the test period. The CNN-LSTM model, was shown to be more sensitive to input perturbations near the stations in which the prediction is intended. This was even clearer for the fine-tuned model, indicating that the model is learning spatially relevant information from the input gridded data, and fine tuning is helping on guiding the model to focus more on the relevant input.  

 

This work shows the potential of CNN and LSTM for regional Rainfall-runoff modeling by capturing spatiotemporal patterns involved in RR process. The work, also, contributes toward more physically interpretable data-driven modeling paradigm.

How to cite: Mohammed, A. and Corzo, G.: Evaluation of regional Rainfall-Runoff modelling using convolutional long short-term memory:  CAMELS dataset in US as a case study., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4177, https://doi.org/10.5194/egusphere-egu23-4177, 2023.

EGU23-4179 | Orals | HS3.3

Improving Data-Driven Flow Forecasting in Large Basins using Machine Learning to Route Flows 

David Lambl, Mostafa Elkurdy, Phil Butcher, Laura K Read, and Alden Keefe Sampson

Producing accurate hourly streamflow forecasts in large basins is difficult without a distributed model to represent both streamflow routing through the river network and the spatial heterogeneity of land and weather conditions. HydroForecast is a theory-guided deep learning flow forecasting product that consists of short-term (hourly predictions out to 10 days), seasonal (10 day predictions out to a year), and daily reanalysis models. This work focuses primarily on the short-term model which has award winning accuracy across a wide range of basins.

In this work, we discuss the implementation of a novel distributed flow forecasting capability of HydroForecast, which splits basins into smaller sub-basins and routes flows from each subbasin to the downstream forecast points of interest. The entire model is implemented as a deep neural network allowing end-to-end training of both sub-basin runoff prediction and flow routing. The model's routing component predicts a unit hydrograph of flow travel time at each river reach and timestep allowing us to inspect and interpret the learned river routing and to seamlessly incorporate any upstream gauge data. 

We compare the accuracy of this distributed model to our original flow forecasting model at selected sites and discuss future improvements that will be made to this model.

How to cite: Lambl, D., Elkurdy, M., Butcher, P., Read, L. K., and Sampson, A. K.: Improving Data-Driven Flow Forecasting in Large Basins using Machine Learning to Route Flows, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4179, https://doi.org/10.5194/egusphere-egu23-4179, 2023.

EGU23-4801 | Posters on site | HS3.3

Improving Streamflow Predictions over Indian Catchments using Long Short Term Memory Networks 

Bhanu Magotra, Manabendra Saharia, and Chandrika Thulaseedharan Dhanya

Streamflow modelling plays a critical role in water resource management activities. The “physically based" models require high computation resources and large amounts of input meteorological data which results in high operating costs and longer running times. On the other hand, with advancements in deep-learning techniques, data-driven models such as long short-term memory (LSTM) networks have been shown to successfully model non-linear rainfall-runoff relationships through historically observed data at a fraction of computation cost. Moreover, using physics-informed machine learning techniques, the physical consistency of data-driven models can be further improved. In this study, one such method is applied where we trained a physics-informed LSTM network model over 278 Indian catchments to simulate streamflow at a daily timestep using historically observed precipitation and streamflow data. The ancillary data included meteorological forcings, static catchment attributes, and Noah-MP simulated land surface states and fluxes such as soil moisture, latent heat, and total evapotranspiration. The LSTM model's performance was evaluated using error metrics such as Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE) and its components, along with skill scores based on 2x2 contingency matrix for hydrological extremes. The trained LSTM model shows improved performance in simulating streamflow over the catchments compared to the physically based model. This will be the first study over India to generate reliable streamflow simulations using a hybrid state-of-the-art approach, which will be beneficial to policy makers for effective water resource management in India. 

How to cite: Magotra, B., Saharia, M., and Dhanya, C. T.: Improving Streamflow Predictions over Indian Catchments using Long Short Term Memory Networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4801, https://doi.org/10.5194/egusphere-egu23-4801, 2023.

EGU23-4842 | ECS | Orals | HS3.3

Introducing DL-GLOBWB: a deep-learning surrogate of a process-based global hydrological model 

Bram Droppers, Myrthe Leijnse, Marc F.P. Bierkens, and Niko Wanders

Process-based global hydrological models are an important tool for sustainable development and policy making in today’s water-scarce world. These models are able to inform national to regional scale water management with basin-scale accounting of water availability and demand and project the impacts of climate change and adaptation on water resources. However, the increasing need for better and higher resolution hydrological information is proving difficult for these state-of-the-art process-based models as the associated computational requirements are significant.

Recently, the deep-learning community has shown that neural networks (in particular the LSTM network) can provide hydrological information with an accuracy that rivals, if not exceeds, that of process-based hydrological models. Although the training of these neural networks takes time, prediction is fast compared to process-based simulations. Nevertheless, training is mostly done on historical observations and thus projections under climate change and adaptation are uncertain.

Inspired by the complementary strengths and weaknesses of the process-based and deep-learning approaches, we present DL-GLOBWB: a deep-learning surrogate of the state-of-the-art PCR-GLOBWB global hydrological model. DL-GLOBWB predicts all water-balance components from the process-based model, including human water demand and abstraction, with a nRSME of 0.05 (range between 0.0001 and 0.32). The DL-GLOBWB surrogate is orders of magnitudes faster than its process-based counterpart, especially as surrogates trained at low resolutions (e.g. 30 arc-minute) can effectively be downscaled to higher resolutions (e.g. 5 arc-minute).

In addition to introducing DL-GLOBWB, our presentation will explore future applications of this deep-learning surrogate, such as (1) improving model calibration and performance by comparing DL-GLOBWB outputs with ins-situ data and satellite observations; (2) training DL-GLOBWB on  future model projections to include global change; and (3) the implementation of DL-GLOBWB to dynamically, and at high resolution, visualize the impact of climate change and adaptation to stakeholders.

How to cite: Droppers, B., Leijnse, M., Bierkens, M. F. P., and Wanders, N.: Introducing DL-GLOBWB: a deep-learning surrogate of a process-based global hydrological model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4842, https://doi.org/10.5194/egusphere-egu23-4842, 2023.

IMERG is a global satellite-based precipitation dataset, produced by NASA. It has provided valuable rainfall information to facilitate the design or the operation of the disaster and risk management worldwide. In operation, NASA offers three types of IMERG Level 3 (L3) products, with different levels of trade-offs in terms of time latency and accuracy. These are Early run (4-hour latency), Late run (14-hour latency) and Final run(3.5-month latency). The final-run product integrates multi-sensor retrievals and provides the highest-quality precipitation estimates among three IMERG products. It however suffers from a long processing latency, which hinders its applicability to near real-time applications. In the past 10 years, deep learning techniques have made significant breakthroughs in various scientific fields, including short-term rainfall forecasting. Deep learning models have shown to have the potential to learn the complex variations in weather systems and to outperform the Numerical Weather Prediction (NWP) in terms of short lead-time predictability and the required computational resources for operation.

 

In this research, we would like to explore the potential of deep learning (DL) in generating high-quality satellite-based precipitation product with low latency. More specifically, we investigate if DL models can learn the difference between Final- and Early-run products, and thus predict a Final-run-like product using Early-run product as input. Low-latency yet high-quality IMERG precipitation product can be therefore obtained. Various DL techniques are being tested in this work, including Auto-Encoder(AE), ConvLSTM and Deep Generative model. IMERG data between 2018 and 2020 over a rectangular area centred in the UK is used for model training and testing, and ground rain gauge records will be used to evaluate the performance of the original and predicted products. This pilot includes both ocean and land regions, which enables the comparison of the model performance between two different surface conditions. Preliminary analysis suggests that given patterns do exist in the differences between Early- and Final-run products, and the capacity of the selected DL models to learn the differences will be further investigated. The proposed work is of great potential to improve the applicability of IMERG products in an operational context.

How to cite: Hung, H. T. and Wang, L.-P.: IMERG Run Deep: Can we produce a low-latency IMERG Final run product with a deep learning based prediction model?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4887, https://doi.org/10.5194/egusphere-egu23-4887, 2023.

EGU23-4970 | ECS | Posters on site | HS3.3

Use of Long-Short Term Memory network (LSTM) in the reconstruction of missing water level data in the Seine River. 

Imad Janbain, Julien Deloffre, Abderrahim Jardani, Minh Tan Vu, and Nicolas Massei

Missing data is the first major problem that appears in many database fields for a set of reasons. It has always been necessary to fill them, which becomes unavoidable and more complicated when the missing periods are longer. Several machine-learning-based approaches have been introduced to deal with this problem. 

The purpose of this paper is to discuss the effectiveness of a new methodology added prior to the LSTM deep learning algorithm to fill in the missing data in the hourly surface water level time series of some stations installed along the Seine River in Normandy-France. In our study, due to a lack of data, a challenging situation was faced where only the water level data in the same station, which contain many missing parts, were used as input and output variables to fill the station itself in a self-learning approach. This contrasts with the common work on imputing missing data, where several features are available to take advantage of in a multivariate and spatiotemporally way, e.g.: using the same variable from other stations or exploiting other physical variables and metrological data, etc. The reconstruction accuracy of the proposed method depends on both the size of the available/missing data and the parameters of the networks. Therefore, we performed sensitivity analyses on both the properties of the networks and the structuring of the input and output data to better determine the appropriate strategy. During this analysis process, a data preprocessing method was developed and added prior to the LSTM model. This data processing method was discovered by presenting many scenarios, each of which was an updated version of the last one. Along with these scenarios, limitations were also addressed and overcome. Finally, the last model version was able to impute missing values that may reach one year of hourly data with high accuracy (One-year RMSE = 0.14 m) regardless of neither the location of the missing part in the series nor its size.  

How to cite: Janbain, I., Deloffre, J., Jardani, A., Vu, M. T., and Massei, N.: Use of Long-Short Term Memory network (LSTM) in the reconstruction of missing water level data in the Seine River., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4970, https://doi.org/10.5194/egusphere-egu23-4970, 2023.

The objective function plays an important role in the training process for deep learning models, since it largely determines the trained values of the model parameters and influences the model performance. In this study, we establish two application-orientated objective functions, namely high flow balance error (HFBE) and transformed mean absolute percentage error (MAPE*), for the forecasts of high flows and low flows, respectively, in the LSTM model. We examine the strength and weakness of these streamflow forecast models trained on HFBE, MAPE* and mean square error (MSE) based on multiple performance metrics. Furthermore, we propose the objective function-based ensemble model (OEM) framework that integrates the models trained on different objective functions, so as to take advantages of the trained models focusing on different aspects of streamflow and thus achieve a better overall performance. Our results in 273 catchments over USA show that the models trained on HFBE can alleviate underestimation in high flows existing in the models trained on MSE, and perform remarkably better for high flows. It is also found that the models trained on MAPE* outperform the other two models in low flow forecast, no matter what algorithm is used for the model establishment. By incorporating the three models trained on HFBE, MAPE* and MSE, respectively, our proposed OEM performs well in the forecasts of both high flows and low flows, and realistically capture the mean and the variability of the observational streamflow under different scenarios under a variety of hydrometeorological conditions. This study highlights the necessity of applying application-orientated objective functions for given projects and the great potential of the ensemble learning methods for multi-optimization in hydrological modeling.

How to cite: Wang, D.: The role of ensemble learning in multi-optimization for streamflow prediction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5044, https://doi.org/10.5194/egusphere-egu23-5044, 2023.

EGU23-5199 | ECS | Posters virtual | HS3.3

How do machine learning models deal with inter-catchment groundwater flows? 

Nicolas Weaver, Taha-Abderrahman El-Ouahabi, Thibault Hallouin, François Bourgin, Charles Perrin, and Vazken Andréassian

Machine learning models have recently gained popularity in hydrological modelling at the catchment scale, fuelled by the increasing availability of large-sample data sets and the increasing accessibility of deep learning frameworks, computing environments, and open-source tools. In particular, several large-sample studies at daily and monthly time scales across the globe showed successful applications of the LSTM architecture as a regional model learning of the hydrological behaviour at the catchment scale. Yet, a deeper understanding of how machine learning models close the water balance and how they deal with inter-catchment groundwater flows is needed to move towards better process understanding. We investigate the performance and behaviour of the LSTM architecture at a monthly time step on a large sample French data set coined CHAMEAU – following the CAMELS initiative. To provide additional information to the learning step of the LSTM, we use the parameter sets and fluxes from the conceptual GR2M model that has a dedicated formulation to deal with inter-catchment groundwater flows. We see this study as a contribution towards the development of hybrid hydrological models.

How to cite: Weaver, N., El-Ouahabi, T.-A., Hallouin, T., Bourgin, F., Perrin, C., and Andréassian, V.: How do machine learning models deal with inter-catchment groundwater flows?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5199, https://doi.org/10.5194/egusphere-egu23-5199, 2023.

EGU23-5445 | ECS | Posters on site | HS3.3

Physics-Informed Neural Networks for Statistical Emulation of Hydrodynamical Numerical Models 

James Donnelly, Alireza Daneshkhah, and Soroush Abolfathi

The application of numerical models for flood and inundation modelling has become widespread in the past decades as a result of significant improvements in computational capabilities. Computational approaches to flood forecasting have significant benefits compared to empirical approaches which estimate statistical patterns of hydrological variables from observed data. However, there is still a significant computational cost associated with numerical flood modelling at high spatio-temporal resolutions. This limitation of numerical modelling has led to the development of statistical emulator models, machine learning (ML) models designed to learn the underlying generating process of the numerical model. The data-driven approach to ML involves relying entirely upon a set of training data to inform decisions about model selection and parameterisations. Deep learning models have leveraged data-driven learning methods with improvements in hardware and an increasing abundance of data to obtain breakthroughs in various fields such as computer vision, natural language processing and autonomous driving. In many scientific and engineering problems however, the cost of obtaining data is high and so there is a need for ML models that are able to generalise in the ‘small-data’ regime common to many complex problems. In this study, to overcome extrapolation and over-fitting issues of data-driven emulators, a Physics-Informed Neural Network model is adopted for the emulation of all two-dimensional hydrodynamic models which model fluid according the shallow water equations. This study introduces a novel approach to encoding the conservation of mass into a deep learning model, with additional terms included in the optimisation criterion, acting to regularise the model, avoid over-fitting and produce more physically consistent predictions by the emulator.

How to cite: Donnelly, J., Daneshkhah, A., and Abolfathi, S.: Physics-Informed Neural Networks for Statistical Emulation of Hydrodynamical Numerical Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5445, https://doi.org/10.5194/egusphere-egu23-5445, 2023.

EGU23-5736 | ECS | Orals | HS3.3

A Novel Workflow for Streamflow Prediction in the Presence of Missing Gauge Observations 

Rendani Mbuvha, Peniel Julien Yise Adounkpe, Mandela Coovi Mahuwetin Houngnibo, and Nathaniel Newlands

Streamflow predictions are a vital tool for detecting flood and drought events. Such predictions are even more critical to Sub-Saraharan African regions that are vulnerable to the increasing frequency and intensity of such events. These regions are sparsely gauged, with few available gauging stations that are often plagued with missing data due to various causes, such as harsh environmental conditions and constrained operational resources. 

This work presents a novel workflow for predicting streamflow in the presence of missing gauge observations. We leverage bias correction of the GEOGloWS ECMWF streamflow service (GESS) forecasts for missing data imputation and predict future streamflow using the state-of-the-art Temporal Fusion transformers at ten river gauging stations in the Benin Republic.

We show by simulating missingness in a testing period that GESS forecasts have a significant bias that results in poor imputation performance over the ten Beninese stations. Our findings suggest that overall bias correction by Elastic Net and Gaussian Process regression achieves superior performance relative to traditional imputation by established methods such as Random Forest, k-Nearest Neighbour, and GESS lookup. We also show that the Temporal Fusion Transformer yields high predictive skill and further provides explanations for predictions through the weights of its attention mechanism. The findings of this work provide a basis for integrating Global streamflow prediction model data and state-of-the-art machine learning models into operational early-warning decision-making systems (e.g., flood/ drought alerts) in resource-constrained countries vulnerable to drought and flooding due to extreme weather events.

How to cite: Mbuvha, R., Adounkpe, P. J. Y., Houngnibo, M. C. M., and Newlands, N.: A Novel Workflow for Streamflow Prediction in the Presence of Missing Gauge Observations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5736, https://doi.org/10.5194/egusphere-egu23-5736, 2023.

EGU23-6313 | ECS | Posters on site | HS3.3

Moving away from deterministic solutions: A probabilistic machine learning approach to account for geological model uncertainty in groundwater modelling 

Mathias Busk Dahl, Troels Norvin Vilhelmsen, Rasmus Bødker Madsen, and Thomas Mejer Hansen

Decision-making related to groundwater management often relies on results from a deterministic groundwater model representing one ‘optimal’ solution. However, such a single deterministic model lacks representation of subsurface uncertainties. The simplicity of such a model is appealing, as typically only one is needed, but comes with the risk of overlooking critical scenarios and possible adverse environmental effects. Instead, we argue, that groundwater management should be based on a probabilistic model that incorporates the uncertainty of the subsurface structures to the extent that it is known. If such a probabilistic model exists, it is, in principle, simple to propagate the uncertainties of the model parameter using multiple numerical simulations, to allow a quantitative and probabilistic base for decision-makers. However, in practice, such an approach can become computationally intractable. Thus, there is a need for quantifying and propagating the uncertainty numerical simulations and presenting outcomes without losing the speed of the deterministic approach.

This presentation provides a probabilistic approach to the specific groundwater modelling task of determining well recharge areas that accounts for the geological uncertainty associated with the model using a deep neural network. The results of such a task are often part of an investigation for new abstraction well locations and should, therefore, present all possible outcomes to give informative decision support. We advocate for the use of a probabilistic approach over a deterministic one by comparing results and presenting examples, where probabilistic solutions are essential for proper decision support. To overcome the significant increase in computation time, we argue that this problem can be solved using a probabilistic neural network trained on examples of model outputs. We present a way of training such a network and show how it performs in terms of speed and accuracy. Ultimately, this presentation aims to contribute with a method for incorporating model uncertainty in groundwater modelling without compromising the speed of the deterministic models.

How to cite: Busk Dahl, M., Norvin Vilhelmsen, T., Bødker Madsen, R., and Mejer Hansen, T.: Moving away from deterministic solutions: A probabilistic machine learning approach to account for geological model uncertainty in groundwater modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6313, https://doi.org/10.5194/egusphere-egu23-6313, 2023.

EGU23-6466 | ECS | Orals | HS3.3

Neural ODE Models in Large-Sample Hydrology 

Marvin Höge, Andreas Scheidegger, Marco Baity-Jesi, Carlo Albert, and Fabrizio Fenicia

Neural Ordinary Differential Equation (ODE) models have demonstrated high potential in providing accurate hydrologic predictions and process understanding for single catchments (Höge et al., 2022). Neural ODEs fuse a neural network model core with a mechanistic equation framework. This hybrid structure offers both traceability of model states and processes, like in conceptual hydrologic models, and the high flexibility of machine learning to learn and refine model interrelations. Aside of the functional dependence of internal processes on driving forces, like of evapotranspiration on temperature, Neural ODEs are also able to learn the effect of catchment-specific attributes, e.g. land cover types, on processes when being trained over multiple basins simultaneously.

 

We demonstrate the performance of a generic Neural ODE architecture in a hydrologic large-sample setup with respect to both predictive accuracy and process interpretability. Using several hundred catchments, we show the capability of Neural ODEs to learn the general interplay of catchment-specific attributes and hydrologic drivers in order to predict discharge in out-of-sample basins. Further, we show how functional relations learned (encoded) by the neural network can be translated (decoded) into an interpretable form, and how this can be used to foster understanding of processes and the hydrologic system.

 

Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., & Fenicia, F.: Improving hydrologic models for predictions and process understanding using Neural ODEs. Hydrol. Earth Syst. Sci., 26, 5085-5102, https://hess.copernicus.org/articles/26/5085/2022/

How to cite: Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., and Fenicia, F.: Neural ODE Models in Large-Sample Hydrology, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6466, https://doi.org/10.5194/egusphere-egu23-6466, 2023.

EGU23-7347 | ECS | Orals | HS3.3

Deep learning for mapping water bodies in the Sahel 

Mathilde de FLEURY, Laurent Kergoat, Martin Brandt, Rasmus Fensholt, Ankit Kariryaa, Gyula Mate Kovács, Stéphanie Horion, and Manuela Grippa

Inland surface water, especially lakes and small water bodies, are essential resources and have impacts on biodiversity, greenhouse gases and health. This is particularly true in the semi-arid Sahelian region, where these resources remain largely unassessed, and little is known about their number, size and quality. Remote sensing monitoring methods remain a promising tool to address these issues at the large scale, especially in areas where field data are scarce. Thanks to technological advances, current remote sensing systems provide data for regular monitoring over time and offer a high spatial resolution, up to 10 metres.  

Several water detection methods have been developed, many of them using spectral information to differentiate water surfaces from soil, through thresholding on water indices (MNDWI for example), or classifications by clustering. These methods are sensitive to optical reflectance variability and are not straight forwardly applicable to regions, such as the Sahel, where the lakes and their environment are very diverse. Particularly, the presence of aquatic vegetation is an important challenge and source of error for many of the existing algorithms and available databases.  

Deep learning, a subset of machine learning methods for training deep neural networks, has emerged as the state-of-the-art approach for a large number of remote sensing tasks. In this study, we apply a deep learning model based on the U-Net architecture to detect water bodies in the Sahel using Sentinel-2 MSI data, and 86 manually defined lake polygons as training data. This framework was originally developed for tree mapping (Brandt et al., 2020, https://doi.org/10.1038/s41586-020-2824-5).   

Our preliminary analysis indicate that our models achieve a good accuracy (98 %). The problems of aquatic vegetation do not appear anymore, and each lake is thus well delimited irrespective of water type and characteristics. Using the water delineations obtained, we then classify different optical water types and thereby highlight different type of waterbodies, that appear to be mostly turbid and eutrophic waters, allowing to better understand the eco-hydrological processes in this region.  

This method demonstrates the effectiveness of deep learning in detecting water surfaces in the study region. Deriving water masks that account for all kind of waterbodies offer a great opportunity to further characterize different water types. This method is easily reproducible due to the availability of the satellite data/algorithm and can be further applied to detect dams and other human-made features in relation to lake environments. 

How to cite: de FLEURY, M., Kergoat, L., Brandt, M., Fensholt, R., Kariryaa, A., Kovács, G. M., Horion, S., and Grippa, M.: Deep learning for mapping water bodies in the Sahel, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7347, https://doi.org/10.5194/egusphere-egu23-7347, 2023.

EGU23-7828 | ECS | Posters on site | HS3.3

Sub-seasonal daily precipitation forecasting based on Long Short-Term Memory (LSTM) models 

Claudia Bertini, Gerald Corzo, Schalk Jan van Andel, and Dimitri Solomatine

Water managers need accurate rainfall forecasts for a wide spectrum of applications, ranging from water resources evaluation and allocation, to flood and drought predictions. In the past years, several frameworks based on Artificial Intelligence have been developed to improve the traditional Numerical Weather Prediction (NWP) forecasts, thanks to their ability of learning from past data, unravelling hidden relationships among variables and handle large amounts of inputs. Among these approaches, Long Short-Term Memory (LSTM) models emerged for their ability to predict sequence data, and have been successfully used for rainfall and flow forecasting, mainly with short lead-times. In this study, we explore three different multi-variate LSTM-based models, i.e. vanilla LSTM, stacked LSTM and bidirectional LSTM, to forecast daily precipitation for the upcoming 30 days in the area of Rhine Delta, the Netherlands. We use both local atmospheric and global climate variables from the ERA-5 reanalysis dataset to predict rainfall, and we introduce a fuzzy index for the models to account for seasonality effects. The framework is developed within the H2020 project CLImate INTelligence (CLINT), and its outcomes have the potential to improve forecasting precipitation deficit in the study area.

How to cite: Bertini, C., Corzo, G., van Andel, S. J., and Solomatine, D.: Sub-seasonal daily precipitation forecasting based on Long Short-Term Memory (LSTM) models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7828, https://doi.org/10.5194/egusphere-egu23-7828, 2023.

Terrestrial water storage (TWS) anomalies from Gravity Recovery and Climate Experiment (GRACE) and its follow on GRACE-FO satellite missions provide a unique opportunity to measure the impact of different climate extremes and human intervention on water use at regional and continental scales. However, temporal gaps within GRACE and GRACE-FO mission (GRACE: 20 months, between GRACE and GRACE-FO: 11 months and GRACE-FO: 2 months) pose difficulties in analyzing spatiotemporal variations in TWS. In this study, Convolutional Long Short-Term Memory Neural Networks (CNN-LSTM) model was developed to fill these gaps and reconstruct the TWS for the Indian subcontinent (April 2002-July 2022). Various meteorological and climatic variables, such as precipitation, temperature, run-off, evapotranspiration, and vegetation, have been integrated to predict GRACE TWS. The performance of the models was evaluated with the help of Pearson’s correlation coefficient (PR), Nash-Sutcliffe efficiency (NSE), and Normalised Root Mean Square Error (NRMSE). Results indicate that the CNN-LSTM model yielded a mean PR of 0.94 and 0.89, NSE of 0.87 and 0.8, and NRMSE of 0.075 and 0.101 on training and testing, respectively. Overall, the CNN-LSTM achieved good performance except in the northwestern region of India, which showed a relatively poor performance might be due to high anthropogenic activity and arid climatic conditions. Further reconstructed time series were used to study the Spatiotemporal variations of TWS over the Indian Subcontinent.

Keywords: GRACE; Deep Learning; TWSA; Indian subcontinent

How to cite: Moudgil, P. S. and Rao, G. S.: Filling Temporal Gaps within and between GRACE and GRACE-FO Terrestrial Water Storage Changes over Indian Sub-Continent using Deep Learning., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8218, https://doi.org/10.5194/egusphere-egu23-8218, 2023.

Recent years have seen an increase of deep learning applications for flow forecasting. Large-sample hydrological (LSH) studies typically try to predict the runoff of a catchment using some selection of hydrometeorological features from the respective catchment. One aspect of these models that has received little attention in LSH is the effect that data from upstream catchments has on model performance. The number of available and stations and distance between stations is highly variable between catchments, which creates a unique modelling challenge. Existing LSH studies either use some form of linear aggregation of upstream flows as input features or omit them altogether. The potential of upstream data to improve the performance of real-time flow forecasts has not yet been systematically evaluated on a large scale. The objective of our study is to evaluate methods for integrating upstream features for real-time, data-driven flow forecasting models. Our study uses a subset of Canadian catchments (n>150) from the HYSETS database. For each catchment, long-short term memory networks (LSTMs) are used to generate flow forecasts for lead times of 1 to 3 days. We evaluate methods for identifying, selecting, and integrating relevant upstream input features within a deep-learning modelling framework, which include using neighbouring upstream stations, using all upstream stations, and using all stations with embedded dimensionality reduction. Early results indicate that while the inclusion of upstream data often yields improvements in model performance, including too much upstream information can easily have detrimental effects.

How to cite: Snieder, E. and Khan, U.: A large sample study of the effects of upstream hydrometeorological input features for LSTM-based daily flow forecasting in Canadian catchments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8746, https://doi.org/10.5194/egusphere-egu23-8746, 2023.

EGU23-9726 | ECS | Posters on site | HS3.3

Flood Forecasting with Deep Learning LSTM Networks: Local vs. Regional Network Training Based on Hourly Data 

Tanja Morgenstern, Jens Grundmann, and Niels Schütze

Floods are among the most frequently occurring natural disasters in Germany. Therefore, predicting their occurrence is a crucial task for efficient disaster management and for the protection of life, property, infrastructure and cultural assets. In recent years Deep Learning methods gained popularity on the research field on flood forecasting methods – Long Short-Term Memory (LSTM) networks being part of them.

Efficient disaster management needs a fine temporal resolution of runoff predictions. Past work at TU Dresden on LSTM networks shows certain challenges when using input data with hourly resolution, such as systematically poor timing in peak flow prediction (Pahner et al. (2019) and Morgenstern et al. (2021)). At times, disaster management even requires flood forecasts for hitherto unobserved catchments, so in total a regionally transferable rainfall-runoff model with a fine temporal resolution is needed. We derived the idea for a potential approach from Kratzert et al. (2019) and Fang et al. (2021): they demonstrate that LSTM networks for rainfall(R)-runoff(R)-modeling benefit from an integration of multiple diverse catchments in the training dataset instead of a strictly local dataset, as this allows the networks to learn universal hydrologic catchment behavior. However, their training dataset consists of daily resolution data.

Following this approach, in this study we train the LSTM networks using single catchments ("local network training") as well as combinations of diverse catchments in Saxony, Germany ("regional network training"). The training data (hourly resolution) consist of area averages of observed precipitation as well as of observed discharge at long-term observation gauges in Saxony. The gauges belong to small, fast-responding Saxon catchments and vary in their hydrological and geographical properties, which in turn are part of the network training as well.

We show the preliminary results and investigate the following questions:

  • With a finer temporal resolution than daily values, characteristics of flood waves become more pronounced. Concerning the detailed simulation of flood waves, do regional LSTM-based R-R-models enable more accurate and robust flow predictions compared to local LSTM-based R-R-models – especially for rare extreme events?
  • Are regional LSTM-based R-R-models – trained at this temporal resolution – able to generalize to unobserved areas or areas with discharge observations unsuitable for network training?

 

References

Fang, K., Kifer, D., Lawson, K., Feng, D., Shen, C. (2022). The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology. In: Water Resources Research (58). DOI: 10.1029/2021WR029583

Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., Nearing, G. (2019). Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrology and Earth System Sciences (23), S. 5089–5110. DOI: 10.5194/hess-23-5089-2019

Morgenstern, T., Pahner, S., Mietrach, R., Schütze, N. (2021): Flood forecasting in small catchments using deep learning LSTM networks. DOI: 10.5194/egusphere-egu21-15072

Pahner, S., Mietrach, R., Schütze, N. (2019): Flood Forecasting in small catchments: a comparative application of long short-term memory networks and artificial neural networks. DOI: 10.13140/RG.2.2.36770.89286.

How to cite: Morgenstern, T., Grundmann, J., and Schütze, N.: Flood Forecasting with Deep Learning LSTM Networks: Local vs. Regional Network Training Based on Hourly Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9726, https://doi.org/10.5194/egusphere-egu23-9726, 2023.

EGU23-10317 | ECS | Posters on site | HS3.3

A convolutional LSTM model with high accuracy to predict extreme precipitation space-time fields 

Hyojeong Choi and Dongkyun Kim

Precipitation forecast models based on meteorological radar data using machine learning architectures accurately predict spatio-temporal progress of precipitation. However, these data-driven forecasting models tend to underestimate magnitude of extreme precipitation events because the training of them is based on the observed precipitation data in which the normal precipitation events are included significantly more than the rare extreme events. This study proposes a ConvLSTM-based precipitation nowcasting model that can accurately predict space-time field of extreme precipitation. First, precipitation events were classified into 5 subsets using the k-means clustering algorithm based their statistical properties such as mean, standard deviation, skewness, duration, and the calendar month at which the precipitation event occurred. Then, a ConvLSTM-based neural network was trained based on the subset containing extreme precipitation events (events with large mean, variance, and duration occurred in summer months). The model was trained and tested based on the 4km-10minute resolution radar-gauge composite precipitation field of central part of South Korea (200km x 200km) for the period of 2009-2015 and 2016-2020, respectively. The NSE of the model that was trained based on the whole precipitation data was 0.55 while the one trained based on the subset of extreme precipitation was 0.78 showing a significant improvement.

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2021R1A2C2003471). 

How to cite: Choi, H. and Kim, D.: A convolutional LSTM model with high accuracy to predict extreme precipitation space-time fields, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10317, https://doi.org/10.5194/egusphere-egu23-10317, 2023.

EGU23-12315 | Posters on site | HS3.3

Meta-modeling with data-driven methods in hydrology 

Tobias Krueger, Mark Somogyvari, Ute Fehrenbach, and Dieter Scherer

Process-based models are the standard tools today when trying to understand how physical systems work. There are situations however, when system understanding is not a primary focus and it is worth substituting existing process-based models with computationally more efficient meta-models (or emulators), i.e. proxies designed for specific applications. In our research we have explored potential data-driven meta-modeling approaches for applications in hydrology, designed to solve specific research questions.

In order to find a suitable meta-modeling approach, we have experimented with a set of different data-driven methods. We have employed a multi-fidelity modeling approach, where we gradually increased the complexity of our models. In total five different approaches were investigated: linear model with ordinary least squares regression, linear model with two different Bayesian methods (Hamiltonian Monte Carlo and transdimensional Monte Carlo) and two machine learning approaches (dense artificial neural network and long short-term memory (LSTM) neural network).

For method development the project case study of the Groß Glienicker Lake was used. This is a glacial lake near Berlin, with a strong negative trend in water levels in the last decades. Supported by the observation model from the Central European Refined analysis, we had a daily, high resolution meteorological dataset (precipitation and actual evapotranspiration) and lake level observations for 16 years.

All of the used models are designed similarly: they predict lake level changes one day ahead using precipitation and evapotranspiration data from the previous 70 days. This interval was selected after an extensive parameter test with the linear model. By predicting the change in stored water, we linearize the problem, and by using a longer time interval we allow the methods to automatically compensate for any lag or memory effects inside the catchment. The different methods are evaluated by comparing the fits between the observed and the reconstructed lake levels.

As expected, increasing the model and inversion complexity improves the quality of the reconstruction. Especially the use of nonlinear models was advantageous, the artificial neural network outperformed every other method. However, in the used example these improvements were relatively small – meaning that in practice the simplest linear method was advantageous due to its computational efficiency and robustness, and ease of use and interpretation.

In this presentation we discuss the challenges of data preparation and optimal model design (especially the memory of the hydrological system), while finding the hyperparameters of the specific methods themselves was relatively straight forward. Our results suggest that problem linearization should be a preferred first step in any meta-modeling application, as it helps the training of nonlinear models as well. We also discuss data requirements, because we found that the size of our dataset was too small for the most complex LSTM method, which yielded unstable results and learned spurious background trends.

How to cite: Krueger, T., Somogyvari, M., Fehrenbach, U., and Scherer, D.: Meta-modeling with data-driven methods in hydrology, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12315, https://doi.org/10.5194/egusphere-egu23-12315, 2023.

EGU23-12952 | ECS | Orals | HS3.3

On the generalization of hydraulic-inspired graph neural networks for spatio-temporal flood simulations 

Roberto Bentivoglio, Elvin Isufi, Sebastian Nicolaas Jonkman, and Riccardo Taormina

The high computational cost of detailed numerical models for flood simulation hinders their use in real-time and limits uncertainty quantification. Deep-learning surrogates have thus emerged as an alternative to speed up simulations. However, most surrogate models currently work only for a single topography, meaning that they need to be retrained for different case studies, ultimately defeating their purpose. In this work, we propose a graph neural network (GNN) inspired by the shallow water equations used in flood modeling, that can generalize the spatio-temporal prediction of floods over unseen topographies. The proposed model works similarly to finite volume methods by propagating the flooding in space and time, given initial and boundary conditions. Following the Courant-Friedrichs-Lewy condition, we link the time step between consecutive predictions to the number of GNN layers employed in the model. We analyze the model's performance on a dataset of numerical simulations of river dike breach floods, with varying topographies and breach locations. The results suggest that the GNN-based surrogate can produce high-fidelity spatio-temporal predictions, for unseen topographies, unseen breach locations, and larger domain areas with respect to the training ones, while reducing computational times.

How to cite: Bentivoglio, R., Isufi, E., Jonkman, S. N., and Taormina, R.: On the generalization of hydraulic-inspired graph neural networks for spatio-temporal flood simulations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12952, https://doi.org/10.5194/egusphere-egu23-12952, 2023.

EGU23-13493 | ECS | Posters on site | HS3.3

Comparison of  a conceptual rainfall-runoff  model with an artificial neural network model for streamflow prediction 

fadil boodoo, carole delenne, Renaud hostache, and julien freychet

Accurate streamflow forecasting can help minimizing the negative impacts of hydrological events such as floods and droughts. To address this challenge, we explore here artificial neural networks models (ANNs) for streamflow forecasting. These models, which have been proven successful in other fields, may offer improved accuracy and efficiency compared to traditional conceptually-based forecasting approaches.

The goal of this study is to compare the performance of a traditional conceptual rainfall-runoff (hydrological) model with an artificial neural network (ANN) model for streamflow forecasting. As a test case, we use the Severn catchment in the United Kingdom. The adopted ANN model has a long short-term memory (LSTM) architecture with two hidden layers, each with 256 neurons. The model is trained on a 25-year dataset from 1988 to 2013 and tested on a 3-year dataset (from 2014 to 2016). It is also validated on a 3-year dataset (from 2017 to 2020, 2019 being a particularly wet year), to assess its performance in extreme hydrological conditions. The study focuses on daily and hourly predictions.

To conduct this study, the conceptual hydrological model called Superflex is used as a benchmark. Both models are first evaluated using the Nash-Sutcliffe Efficiency (NSE) score. To enable a fair and accurate comparison, both models share the same inputs (i.e. meteorological forcings: total precipitation, daily maximum and minimum temperatures, daylight duration, mean surface downward short wave radiation flux, and vapor pressure). The ANN model was implemented using the Neuralhydrology library developed by F. Kratzert.

In our study, we found that LSTM model is able to provide more accurate one-day forecasts than the  hydrological model Superflex. For the daily predictions, the average NSE score using the LSTM model is 0.85 (with an average NSE score of 0.99 for training period, and 0.85 for validation period), which is higher than the NSE score of 0.74 achieved by the Superflex model (with a score of 0.84 for training period).

The hourly prediction using NSE with the superflex model had a score of 0.88, with a score of 0.7 during training. The LSTM model had an average NSE score of 0.87, with an average score of 0.99 during training and an average score of 0.85 during validation.

These results were obtained without adjusting the hyperparameters and by training the model only on data from the Severn watershed.The ANN model has demonstrated promising results compared to a state-of-the-art conceptual hydrological model in our studies. We will further compare both models using different training dataset periods, and different catchements. These additional tests will provide more information on the capabilities of the LSTM model and help to confirm its effectiveness.

How to cite: boodoo, F., delenne, C., hostache, R., and freychet, J.: Comparison of  a conceptual rainfall-runoff  model with an artificial neural network model for streamflow prediction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13493, https://doi.org/10.5194/egusphere-egu23-13493, 2023.

EGU23-14399 | ECS | Orals | HS3.3

LSTMs for Hydrological Modelling in Swiss Catchments 

Christina Lott, Leonardo Martins, Jonas Weiss, Thomas Brunschwiler, and Peter Molnar

Simulation of the catchment rainfall-runoff transformation with physically based watershed models is a traditional way to predict streamflow and other hydrological variables at catchment scales. However, the calibration of such models requires large data inputs and computational power and contains many parameters which are often impossible to constrain or validate. An alternative approach is to use data-driven machine learning for streamflow prediction.

In the past few years, LSTM (long short-term memory) models and its variants have been explored in rainfall-runoff modelling. Typical applications use daily climate variables as inputs and model the rainfall-runoff transformation processes with different timescales of memory. This is especially useful as delays in runoff production by snow accumulation and melt, soil water storage, evapotranspiration, etc., can be included. In contrast to feed-forward ANNs (artificial neural networks), LSTMs are capable of maintaining the sequential temporal order of inputs, and compared to RNNs (recurrent neural networks), of learning the long-term dependencies. [1]

However, current work on LSTMs mostly focuses on the USA, the UK and Brazil, where CAMELS datasets are available [1, 2, 3]. Catchments at higher altitudes with snow-driven dynamics and sometimes glaciers are present in small number in these datasets (if at all). Systematic applications of LSTMs for streamflow prediction in climates where a significant part of the catchments are snow and ice dominated are missing. In this work, an FS-LSTM (fast slow-LSTM) previously applied in Brazil is adapted for Swiss catchments to fill this gap [3]. The FS-LSTM explored builds on the work of Hoedt et al. (2021) that imposed mass constraints on an LSTM, called MC-LSTM [4]. FS-LSTM adds a fast and slow part for streamflow, containing rainfall and soil moisture respectively. We will discuss benchmark results against an existing semi-distributed conceptual model widely used in Switzerland for streamflow simulation [5].

 

References:

[1]: Kratzert et al., Rainfall-runoff modelling using Long Short-Term Memory (LSTM) networks, 2018.

[2]: Lees et al., Hydrological concept formation inside long short-term memory (LSTM) networks, 2022.

[3]: Quinones et al., Fast-Slow Streamflow Model Using Mass-Conserving LSTM, 2021.

[4]: Hoedt et al., MC-LSTM: Mass-Conserving LSTM, 2021.

[5]: Viviroli et al., An introduction to the hydrological modelling system PREVAH and its pre- and post-processing-tools, 2009.

How to cite: Lott, C., Martins, L., Weiss, J., Brunschwiler, T., and Molnar, P.: LSTMs for Hydrological Modelling in Swiss Catchments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14399, https://doi.org/10.5194/egusphere-egu23-14399, 2023.

Improving the understanding of processes is vital to hydrological modeling. One key challenge is how to extract interpretable information that can describe the complex hydrological system from the growing number of observation data to advance our understanding of processes and modeling. To address the problem, we propose a data-driven framework to discover coordinate transformation, which transfers original observations to a reduced-dimension system. The framework combines deep learning method with sparse regression to approximate the specific hydrological process: deep learning methods have a rich representation to promote generalization, and sparse regression can sparsely identify parsimonious models to promote interpretability. By doing so, we can identify the essential latent variables under a physically meaning-wise coordinate system where the hydrological processes are linearly and sparsity represented to capture the behavior of the system from observations. To demonstrate the framework, we focus on the evaporation process. The relationships between potential evaporation and climate variables including long/short wave radiation, air temperature, air pressure, relative humidity, and wind speed are quantified. The connection between the climate variables and coordinates components extracted are evaluated to capture the pattern of climate variables in the component space. The robustness and statistical stability of the framework is examined based on distributed observations from FluxNet towers over North America. The resulting modeling framework shows the potential of deep learning methods for improving our knowledge of the hydrological system.

How to cite: Hu, X., Tuo, Y., and Disse, M.: Deep learning based coordinates transformations for improving process understanding in hydrological modeling system, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14631, https://doi.org/10.5194/egusphere-egu23-14631, 2023.

EGU23-15575 | Orals | HS3.3

Application of deep convolutional neural networks for precipitation estimation through both top-down and bottom-up approaches 

Hamidreza Mosaffa, Paolo Filippucci, Luca Ciabatta, Christian Massari, and Luca Brocca

Reliable and accurate precipitation estimations are a crucial hydrological parameter for various applications, including managing water resources, drought monitoring and natural hazard prediction. The two main approaches for estimating precipitation from satellite data are the top-down and bottom-up. The top-down approach uses data from Geostationary and Low Earth Orbiting satellites to infer precipitation from atmosphere and cloud information, while the bottom-up approach estimates precipitation using soil moisture observations, e.g.  the SM2RAIN algorithm. The main difference between these approaches is that the top-down approach is a more direct method of measuring precipitation that estimates it instantaneously, which may lead to underestimation, while the bottom-up approach measures accumulated rainfall with more reliable precipitation estimation between two consecutive SM measurements. In this study, we develop the deep convolutional neural networks (CNN) algorithm to combine the top-down and bottom-up approaches for estimating precipitation using the satellite level 1 products including the satellite backscatter information from the Advanced SCATterometer (ASCAT), infrared (IR) and water vapor (WV) channels from geostationary satellites. This algorithm is assessed at 0.1° spatial and daily temporal resolution over Italy for the period of 2019-2021. The results show that the developed model improves the accuracy of precipitation estimation. Additionally, it indicates that there is a significant potential for global precipitation estimation using this model.

How to cite: Mosaffa, H., Filippucci, P., Ciabatta, L., Massari, C., and Brocca, L.: Application of deep convolutional neural networks for precipitation estimation through both top-down and bottom-up approaches, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15575, https://doi.org/10.5194/egusphere-egu23-15575, 2023.

EGU23-15604 | ECS | Posters on site | HS3.3

Forecasting discharges through explainable machine learning approaches at an alpine karst spring 

Anna Pölz, Julia Derx, Andreas Farnleitner, and Alfred Paul Blaschke

Karst springs provide drinking water for approximately 700 million people worldwide. Complex subsurface flow processes lead to challenges for modelling spring discharges. Machine learning (ML) models possess the ability to learn non-linear patterns and show promising results in forecasting dynamic spring discharge. We compare the performance of three ML models of varying complexity in forecasting karst spring discharges: the multivariate adaptive regression spline model (MARS), a feed-forward neural network (ANN) and a long short-term memory model (LSTM). The well-studied alpine karst spring LKAS2 in Austria is used as test case. We provide model explanations including feature attribution through Shapley additive explanations (SHAP), a method based on Shapley values. Our results show that the higher the model complexity, the higher the accuracy, based on the evaluated symmetric mean absolute percentage error of the three investigated models. With SHAP every prediction can be explained through each feature in each input time step. We found seasonal model differences. For example, snow influenced the model mostly in winter and spring. Analyzing the combinations of input time steps and features provided further insights into the model performance. For instance, the SHAP results showed that a high electrical conductivity in recent time steps, which indicates that the karst water is less diluted with precipitation, leads to a reduced discharge forecast. These feature attribution results coincide with physical processes within karst systems. Therefore, the introduced SHAP method can increase the confidence in ML model forecasts and emphasizes the raison d’être of complex and accurate deep learning models in hydrology. This allows the operator to better understand and evaluate the model’s output, which is essential for drinking water management.

How to cite: Pölz, A., Derx, J., Farnleitner, A., and Blaschke, A. P.: Forecasting discharges through explainable machine learning approaches at an alpine karst spring, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15604, https://doi.org/10.5194/egusphere-egu23-15604, 2023.

EGU23-15629 | ECS | Posters virtual | HS3.3

Peak Hydrological Event Simulation with Deep Learning Algorithm 

Nicole Tatjana Scherer, Muhammad Nabeel Usmann, Markus Disse, and Jingshui Huang

Most floods are caused by heavy rainfall events, including the disaster in the Simbach catchment in 2016. For the Simbach catchment, a study was already carried out using the conceptual Hydrologiska Byråns Vattenbalansavdelning (HBV) model to simulate the extreme event of 2016. While the calibration model performance is classified as very good, the overall validation is classified as unsatisfactory. Recent studies showed that data-driven models outperform benchmark rainfall-runoff models. A widely used data-driven model is the Long-Short-Term-Memory algorithm (LSTM). The main advantage of this algorithm is the ability to learn short-term as well as long term dependencies.

The objective of this work is to determine if a data-driven model outperforms the conceptual model. For this purpose, in a first step a LSTM model is setup and its results are compared with the results of the HBV model. It is assumed that the LSTM model outperforms the HBV model in training and validation but is not able to simulate the extreme event, because the extrapolation capabilities of Neuronal Networks are poor if they operate outside of their training range. In a second step, it is studied if the model performance can be improved by providing more features to the model. Therefore, different feature combinations are provided to the model. Furthermore, it is assumed that providing more data to the model will improve its performance. Therefore, in a third step more events are used for training and validation.

It was concluded that the LSTM model is able to simulate the rainfall-runoff process. A satisfactory overall model performance can be achieved using only precipitation as input data and a small training dataset of four events. But, as the HBV model, the LSTM model is not able to simulate the extreme event, because no extreme event is present within the training dataset. However, the LSTM model outperforms the HBV model, because the LSTM generalizes better. Furthermore, the model performance of the LSTM model using six events can be improved by providing additionally the soil moisture class as input data. Whereas providing more features to the model results in worse model performance. Providing more events to the model does not significantly improve its performance. However, the model improved especially for the event in June 2015. If the model is trained with more events having higher magnitude than the 2015 event, the event in 2015 is no longer classified as an out-of-sample event, resulting in better model performance. Providing the model more events and more input features does not significantly improve the model performance. 

The results show the potential and limitations using the LSTM model in modeling extreme events.

How to cite: Scherer, N. T., Usmann, M. N., Disse, M., and Huang, J.: Peak Hydrological Event Simulation with Deep Learning Algorithm, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15629, https://doi.org/10.5194/egusphere-egu23-15629, 2023.

EGU23-16658 | ECS | Orals | HS3.3

Improving large-basin streamflow simulation using a modular, differentiable, learnable graph model for routing 

Tadd Bindas, Wen-Ping Tsai, Jiangtao Liu, Farshid Rahmani, Dapeng Feng, Yuchen Bian, Kathryn Lawson, and Chaopeng Shen

Differentiable modeling has been introduced recently as a method to learn relationships from a combination of data and structural priors. This method uses end-to-end gradient tracking inside a process-based model to tune internal states and parameters along with neural networks, allowing us to learn underlying processes and spatial patterns. Hydrologic routing modules are typically needed to simulate flows in stem rivers downstream of large, heterogeneous basins, but obtaining suitable parameterization for them has previously been difficult. In this work, we apply differentiable modeling in the scope of streamflow prediction by coupling a physically-based routing model (which computes flow velocity and discharge in the river network given upstream inflow conditions) to neural networks which provide parameterizations for Manning’s river roughness parameter (n). This method consists of an embedded Neural Network (NN), which uses (imperfect) DL-simulated runoffs and reach-scale attributes as forcings and inputs, respectively, entered into the Muskingum-Cunge method and trained solely on downstream discharge. Our initial results show that while we cannot identify channel geometries, we can learn a parameterization scheme for roughness that follows observed n trends. Training on a short sample of observed data showed that we could obtain highly accurate routing results for the training and inner, untrained gages. This general framework can be applied to small and large scales to learn channel roughness and predict streamflow with heightened interpretability. 

 

How to cite: Bindas, T., Tsai, W.-P., Liu, J., Rahmani, F., Feng, D., Bian, Y., Lawson, K., and Shen, C.: Improving large-basin streamflow simulation using a modular, differentiable, learnable graph model for routing, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16658, https://doi.org/10.5194/egusphere-egu23-16658, 2023.

Although deep learning (DL) models have shown extraordinary performance in hydrologic modeling, they are still hard to interpret and not able to predict untrained hydrologic variables due to lacking physical meanings and constraints. This study established hybrid differentiable models (namely the delta models) with regionalized parameterization and learnable structures based on a DL-based differentiable parameter learning (dPL) framework. The simulation experiments on both US and global basins demonstrate that the delta models can approach the performance of the state-of-the-art long short-term memory (LSTM) network on discharge prediction. Different from the pure data-driven LSTM model, the delta models can output a full set of hydrologic variables not used as training targets. The evaluation with independent data sources showed that the delta models, only trained on discharge observations, can also give decent predictions for ET and baseflow. The spatial extrapolation experiments showed that the delta models can surpass the performance of the LSTM model for predictions in large ungauged regions in terms of the daily hydrographic metrics and multi-year trend prediction. The spatial patterns of the parameters learned by the delta models remain remarkably stable from the in-sample to spatial out-of-sample predictions, which explains the robustness of the delta models for spatial extrapolation. More importantly, the proposed modeling framework enables directly learning new relations between intermediate variables from large observations. This study shows that the model performance and physical meanings can be balanced with the differentiable modeling approach which is promising to large-scale hydrologic prediction and knowledge discovery.

How to cite: Feng, D. and Shen, C.: A differentiable modeling approach to systematically integrating deep learning and physical models for large-scale hydrologic prediction and knowledge discovery, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16947, https://doi.org/10.5194/egusphere-egu23-16947, 2023.

EGU23-16974 | Orals | HS3.3

From Hindcast to Forecast with Deep Learning Streamflow Models 

Grey Nearing, Martin Gauch, Daniel Klotz, Frederik Kratzert, Asher Metzger, Guy Shalev, Shlomo Shenzis, Tadele Tekalign, Dana Weitzner, and Oren Gilon

Deep learning has become the de facto standard for streamflow simulation. While there are examples of deep learning based streamflow forecast models (e.g., 1-5), the majority of the development and research has been done with hindcast models. The primary challenge in using deep learning models for forecasting (e.g., flood forecasting) is that the meteorological input data are drawn from different distributions in hindcast vs. forecast. The (relatively small) amount of research that has been done on deep learning streamflow forecasting has largely used an encoder-decoder approach to account for forecast distribution shifts. This is, for example, what Google’s operational flood forecasting model uses [4]. 

In this work we show that the encoder-decoder approach results in artifacts in forecast trajectories that are not detectable with standard hydrological metrics, but which can cause forecasts to have incorrect trends (e.g., rising when they should be falling and vice-versa).  We solve this problem using regularized embeddings, which remove forecast artifacts without harming overall accuracy. 

Perhaps more importantly, input embeddings allow for training models on spatially and/or temporally incomplete meteorological inputs, meaning that a single model can be trained using input data that does not exist everywhere or does not exist during the entire training or forecast period. This allows models to learn from a significantly larger training data set, which is important for high-accuracy predictions. It also allows large (e.g., global) models to learn from local weather data. We demonstrate how and why this is critical for state-of-the-art global-scale streamflow forecasting. 

 

  • Franken, Tim, et al. An operational framework for data driven low flow forecasts in Flanders. No. EGU22-6191. Copernicus Meetings, 2022.
  • Kao, I-Feng, et al. "Exploring a Long Short-Term Memory based Encoder-Decoder framework for multi-step-ahead flood forecasting." Journal of Hydrology 583 (2020): 124631.
  • Liu, Darong, et al. "Streamflow prediction using deep learning neural network: case study of Yangtze River." IEEE access 8 (2020): 90069-90086.
  • Nevo, Sella, et al. "Flood forecasting with machine learning models in an operational framework." Hydrology and Earth System Sciences 26.15 (2022): 4013-4032.
  • Girihagama, Lakshika, et al. "Streamflow modelling and forecasting for Canadian watersheds using LSTM networks with attention mechanism." Neural Computing and Applications 34.22 (2022): 19995-20015.

 

How to cite: Nearing, G., Gauch, M., Klotz, D., Kratzert, F., Metzger, A., Shalev, G., Shenzis, S., Tekalign, T., Weitzner, D., and Gilon, O.: From Hindcast to Forecast with Deep Learning Streamflow Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16974, https://doi.org/10.5194/egusphere-egu23-16974, 2023.

EGU23-582 | ECS | Posters on site | ITS1.13/AS5.2

Modeling the Variability of Terrestrial Carbon Fluxes using Transformers 

Swarnalee Mazumder and Ayush Prasad

The terrestrial carbon cycle is one of the largest sources of uncertainty in climate projections. The terrestrial carbon sink which removes a quarter of anthropogenic CO2 emissions; is highly variable in time and space depending on climate. Previous studies have found that data-driven models such as random forest, artificial neural networks and long short-term memory networks can be used to accurately model Net Ecosystem Exchange (NEE) and Gross Primary Productivity (GPP) accurately, which are two important metrics to quantify the direction and magnitude of CO2 transfer between the land surface and the atmosphere. Recently, a new class of machine learning models called transformers have gained widespread attention in natural language processing tasks due to their ability to learn from large volumes of sequential data. In this work, we use Transformers to model NEE and GPP from 1996-2022 at 39 Flux stations in the ICOS Europe network using ERA5 reanalysis data. We can compare our results with traditional machine learning approaches to evaluate the generalisability and predictive performance of transformers for carbon flux modelling.

How to cite: Mazumder, S. and Prasad, A.: Modeling the Variability of Terrestrial Carbon Fluxes using Transformers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-582, https://doi.org/10.5194/egusphere-egu23-582, 2023.

EGU23-1825 | ECS | Orals | ITS1.13/AS5.2

Spatial representation learning for ensemble weather simulations using invariant variational autoencoders 

Jieyu Chen, Kevin Höhlein, and Sebastian Lerch

Weather forecasts today are typically issued in the form of ensemble simulations based on multiple runs of numerical weather prediction models with different perturbations in the initial states and the model physics. In light of the continuously increasing spatial resolutions of operational weather models, this results in large, high-dimensional datasets that nonetheless contain relevant spatial and temporal structure, as well as information about the predictive uncertainty. We propose invariant variational autoencoder (iVAE) models based on convolutional neural network architectures to learn low-dimensional representations of the spatial forecast fields. We specifically aim to account for the ensemble character of the input data and discuss methodological questions about the optimal design of suitable dimensionality reduction methods in this setting. Thereby, our iVAE models extend previous work where low-dimensional representations of single, deterministic forecast fields were learned and utilized for incorporating spatial information into localized ensemble post-processing methods based on neural networks [1], which were able to improve upon model utilizing location-specific inputs only [2]. By additionally incorporating the ensemble dimension and learning representation for probability distributions of spatial fields, we aim to enable a more flexible modeling of relevant predictive information contained in the full forecast ensemble. Additional potential applications include data compression and the generation of forecast ensembles of arbitrary size.

We illustrate our methodological developments based on a 10-year dataset of gridded ensemble forecasts from the European Centre for Medium-Range Weather Forecasts of several meteorological variables over Europe. Specifically, we investigate alternative model architectures and highlight the importance of tailoring the loss function to the specific problem at hand.

References:

[1] Lerch, S. & Polsterer, K.L. (2022). Convolutional autoencoders for spatially-informed ensemble post-processing. ICLR 2022 AI for Earth and Space Science Workshop, https://arxiv.org/abs/2204.05102.

[2] Rasp, S. & Lerch, S. (2018). Neural networks for post-processing ensemble weather forecasts. Monthly Weather Review, 146, 3885-3900.

How to cite: Chen, J., Höhlein, K., and Lerch, S.: Spatial representation learning for ensemble weather simulations using invariant variational autoencoders, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1825, https://doi.org/10.5194/egusphere-egu23-1825, 2023.

EGU23-3117 | Orals | ITS1.13/AS5.2

AtmoRep: Large Scale Representation Learning for Atmospheric Data 

Christian Lessig, Ilaria Luise, and Martin Schultz

The AtmoRep project asks if one can train one neural network that represents and describes all atmospheric dynamics. AtmoRep’s ambition is hence to demonstrate that the concept of large-scale representation learning, whose principle feasibility and potential was established by large language models such as GPT-3, is also applicable to scientific data and in particular to atmospheric dynamics. The project is enabled by the large amounts of atmospheric observations that have been made in the past as well as advances on neural network architectures and self-supervised learning that allow for effective training on petabytes of data. Eventually, we aim to train on all of the ERA5 reanalysis and, furthermore, fine tune on observational data such as satellite measurements to move beyond the limits of reanalyses.

We will present the theoretical formulation of AtmoRep as an approximate representation for the atmosphere as a stochastic dynamical system. We will also detail our transformer-based network architecture and the training protocol for self-supervised learning so that unlabelled data such as reanalyses, simulation outputs and observations can be employed for training and re-fining the network. Results will be presented for the performance of AtmoRep for downscaling, precipitation forecasting, the prediction of tropical convection initialization, and for model correction. Furthermore, we also demonstrate that AtmoRep has substantial zero-short skill, i.e., it is capable to perform well on tasks it was not trained for. Zero- and few-shot performance (or in context learning) is one of the hallmarks of large-scale representation learning and to our knowledge has never been demonstrated in the geosciences.

How to cite: Lessig, C., Luise, I., and Schultz, M.: AtmoRep: Large Scale Representation Learning for Atmospheric Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3117, https://doi.org/10.5194/egusphere-egu23-3117, 2023.

Numerical Earth system models (ESMs) are our primary tool for projecting future climate scenarios. Their simulation output is used by impact models that assess the effect of anthropogenic global warming, e.g., on flood events, vegetation changes or crop yields. Precipitation, an atmospheric variable with arguably one of the largest socio-economic impacts, involves various processes on a wide range of spatial-temporal scales. However, these cannot be completely resolved in ESMs due to the limited discretization of the numerical model. 
This can lead to biases in the ESM output that need to be corrected in a post-processing step prior to feeding ESM output into impact models, which are calibrated with observations [1]. While established post-processing methods successfully improve the modelled temporal statistics for each grid cell individually, unrealistic spatial features that require a larger spatial context are not addressed.
Here, we apply a cycle-consistent generative adversarial network (CycleGAN) [2] that is physically constrained to the precipitation output from Coupled Model Intercomparison Project phase 6 (CMIP6)  ESMs to correct both temporal distributions and spatial patterns. The CycleGAN can be naturally trained on daily ESM and reanalysis fields that are unpaired due to the deviating trajectories of the ESM and observation-based ground truth. 
We evaluate our method against a state-of-the-art bias adjustment framework (ISIMIP3BASD) [3] and find that it outperforms it in correcting spatial patterns and achieves comparable results on temporal distributions. We further discuss the representation of extreme events and suitable metrics for quantifying the realisticness of unpaired precipitation fields.

 [1] Cannon, A.J., et al. "Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes?." Journal of Climate 28.17 (2015): 6938-6959.

[2] Zhu, J.-Y., et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.

[3] Lange, S. "Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0)." Geoscientific Model Development 12.7 (2019): 3055-3070.

How to cite: Hess, P., Lange, S., and Boers, N.: Improving global CMIP6 Earth system model precipitation output with generative adversarial networks for unpaired image-to-image translation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3128, https://doi.org/10.5194/egusphere-egu23-3128, 2023.

EGU23-3256 | Orals | ITS1.13/AS5.2

Emulating radiative transfer in a numerical weather prediction model 

Matthew Chantry, Peter Ukkonen, Robin Hogan, and Peter Dueben

Machine learning, and particularly neural networks, have been touted as a valuable accelerator for physical processes. By training on data generated from an existing algorithm a network may theoretically learn a more efficient representation and accelerate the computations via emulation. For many parameterized physical processes in weather and climate models this being actively pursued. Here, we examine the value of this approach for radiative transfer within the IFS, an operational numerical weather prediction model where both accuracy and speed are vital. By designing custom, physics-informed, neural networks we achieve outstanding offline accuracy for both longwave and shortwave processes. In coupled testing we find minimal changes to forecast scores at near operational resolutions. We carry out coupled inference on GPUs to maximise the speed benefits from the emulator approach.

How to cite: Chantry, M., Ukkonen, P., Hogan, R., and Dueben, P.: Emulating radiative transfer in a numerical weather prediction model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3256, https://doi.org/10.5194/egusphere-egu23-3256, 2023.

EGU23-3321 | ECS | Orals | ITS1.13/AS5.2

Using machine learning to improve dynamical predictions in a coupled model 

Zikang He, Julien Brajard, Yiguo Wang, Xidong Wang, and Zheqi Shen

Dynamical models used in climate prediction often have systematic errors that can bias the predictions. In this study, we utilized machine learning to address this issue. Machine learning was applied to learn the error corrected by data assimilation and thus build a data-driven model to emulate the dynamical model error. A hybrid model was constructed by combining the dynamical and data-driven models. We tested the hybrid model using synthetic observations generated by a simplified high-resolution coupled ocean-atmosphere model (MAOOAM, De Cruz et al., 2016) and compared its performance to that of a low-resolution version of the same model used as a standalone dynamical model.

To evaluate the forecast skill of the hybrid model, we produced ensemble predictions based on initial conditions determined through data assimilation. The results show that the hybrid model significantly improves the forecast skill for both atmospheric and oceanic variables compared to the dynamical model alone. To explore what affects short-term forecast skills and long-term forecast skills, we built two other hybrid models by correcting errors either only atmospheric or only oceanic variables. For short-term atmospheric forecasts, the results show that correcting only oceanic errors has no effect on atmosphere variables forecasts but correcting only atmospheric variables shows similar forecast skill to correcting both atmospheric and oceanic errors. For the long-term forecast of oceanic variables, correcting the oceanic error can improve the forecast skill, but correcting both atmospheric and oceanic errors can obtain the best forecast skill. The results indicate that for the long-term forecast of oceanic variables, bias correction of both oceanic and atmospheric components can have a significant effect.

How to cite: He, Z., Brajard, J., Wang, Y., Wang, X., and Shen, Z.: Using machine learning to improve dynamical predictions in a coupled model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3321, https://doi.org/10.5194/egusphere-egu23-3321, 2023.

EGU23-3340 | ECS | Orals | ITS1.13/AS5.2

An iterative data-driven emulator of an ocean general circulation model 

Rachel Furner, Peter Haynes, Dan(i) Jones, Dave Munday, Brooks Paige, and Emily Shuckburgh

Data-driven models are becoming increasingly competent at tasks fundamental to weather and climate prediction. Relative to machine learning (ML) based atmospheric models, which have shown promise in short-term forecasting, ML-based ocean forecasting remains somewhat unexplored. In this work, we present a data-driven emulator of an ocean GCM and show that performance over a single predictive step is skilful across all variables under consideration. Iterating such data-driven models poses additional challenges, with many models suffering from over-smoothing of fields or instabilities in the predictions. We compare a variety of methods for iterating our data-driven emulator and assess them by looking at how well they agree with the underlying GCM in the very short term and how realistic the fields remain for longer-term forecasts. Due to the chaotic nature of the system being forecast, we would not expect any model to agree with the GCM accurately over long time periods, but instead we expect fields to continue to exhibit physically realistic behaviour at ever increasing lead times. Specifically, we expect well-represented fields to remain stable whilst also maintaining the presence and sharpness of features seen in both reality and in GCM predictions, with reduced emphasis on accurately representing the location and timing of these features. This nuanced and temporally changing definition of what constitutes a ‘good’ forecast at increasing lead times generates questions over both (1) how one defines suitable metrics for assessing data-driven models, and perhaps more importantly, (2) identifying the most promising loss functions to use to optimise these models.

How to cite: Furner, R., Haynes, P., Jones, D., Munday, D., Paige, B., and Shuckburgh, E.: An iterative data-driven emulator of an ocean general circulation model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3340, https://doi.org/10.5194/egusphere-egu23-3340, 2023.

EGU23-4337 | Orals | ITS1.13/AS5.2 | Highlight

Towards a new surrogate model for predicting short-term NOx-O3 effects from aviation using Gaussian processes 

Pratik Rao, Richard Dwight, Deepali Singh, Jin Maruhashi, Irene Dedoussi, Volker Grewe, and Christine Frömming

While efforts have been made to curb CO2 emissions from aviation, the more uncertain non-CO2 effects that contribute about two-thirds to the warming in terms of radiative forcing (RF), still require attention. The most important non-CO2 effects include persistent line-shaped contrails, contrail-induced cirrus clouds and nitrogen oxide (NOx) emissions that alter the ozone (O3) and methane (CH4) concentrations, both of which are greenhouse gases, and the emission of water vapour (H2O). The climate impact of these non-CO2 effects depends on emission location and prevailing weather situation; thus, it can potentially be reduced by advantageous re-routing of flights using Climate Change Functions (CCFs), which are a measure for the climate effect of a locally confined aviation emission. CCFs are calculated using a modelling chain starting from the instantaneous RF (iRF) measured at the tropopause that results from aviation emissions. However, the iRF is a product of computationally intensive chemistry-climate model (EMAC) simulations and is currently restricted to a limited number of days and only to the North Atlantic Flight Corridor. This makes it impossible to run EMAC on an operational basis for global flight planning. A step in this direction lead to a surrogate model called algorithmic Climate Change Functions (aCCFs), derived by regressing CCFs (training data) against 2 or 3 local atmospheric variables at the time of emission (features) with simple regression techniques and are applicable only in parts of the Northern hemisphere. It was found that in the specific case of O3 aCCFs, which provide a reasonable first estimate for the short-term impact of aviation NOx on O3 warming using temperature and geopotential as features, can be vastly improved [1]. There is aleatoric uncertainty in the full-order model (EMAC), stemming from unknown sources (missing features) and randomness in the known features, which can introduce heteroscedasticity in the data. Deterministic surrogates (e.g. aCCFs) only predict point estimates of the conditional average, thereby providing an incomplete picture of the stochastic response. Thus, the goal of this research is to build a new surrogate model for iRF, which is achieved by :

1. Expanding the geographical coverage of iRF (training data) by running EMAC simulations in more regions (North & South America, Eurasia, Africa and Australasia) at multiple cruise flight altitudes,

2. Following an objective approach to selecting atmospheric variables (feature selection) and considering the importance of local as well as non-local effects,

3. Regressing the iRF against selected atmospheric variables using supervised machine learning techniques such as homoscedastic and heteroscedastic Gaussian process regression.

We present a new surrogate model that predicts iRF of aviation NOx-O3 effects on a regular basis with confidence levels, which not only improves our scientific understanding of NOx-O3 effects, but also increases the potential of global climate-optimised flight planning.

References

[1] Rao, P.; et al. Case Study for Testing the Validity of NOx-Ozone Algorithmic Climate Change Functions for Optimising Flight Trajectories. Aerospace 20229, 231. https://doi.org/10.3390/aerospace9050231

How to cite: Rao, P., Dwight, R., Singh, D., Maruhashi, J., Dedoussi, I., Grewe, V., and Frömming, C.: Towards a new surrogate model for predicting short-term NOx-O3 effects from aviation using Gaussian processes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4337, https://doi.org/10.5194/egusphere-egu23-4337, 2023.

Time transfer functions describe the change of state variables over time in geoscientific numerical simulation models. The identification of these functions is an essential but challenging step in model building. While traditional methods rely on qualitative understanding or first order principles, the availability of large spatio-temporal data sets from direct measurements or extremely detailed physical-based system modelling has enabled the use of machine learning methods to discover the time transfer function directly from data. In this study we explore the feasibility of this data driven approach for numerical simulation of the co-evolution of soil, hydrology, vegetation, and grazing on landscape scale, at geological timescales. From empirical observation and hyper resolution (1 m, 1 week) modelling (Karssenberg et al, 2017) it has been shown that a hillslope system shows complex behaviour with two stable states, respectively high biomass on deep soils (healthy state) and low biomass on thin soils (degraded or desertic state). A catastrophic shift from healthy to degraded state occurs under changes of external forcing (climate, grazing pressure), with a transient between states that is rapid or slow depending on system characteristics. To identify and use the time transfer functions of this system at hillslope scale we follow four procedural steps. First, an extremely large data set of hillslope average soil and vegetation state is generated by a mechanistic hyper resolution (1 m, 1 week) system model, forcing it with different variations in grazing pressure over time. Secondly, a machine learning model predicting the rate of change in soil and vegetation as function of soil, vegetation, and grazing pressure, is trained on this data set. In the third step, we explore the ability of this trained machine learning model to predict the rate of system change (soil and vegetation) on untrained data. Finally, in the fourth step, we use the trained machine learning model as time transfer function in a forward numerical simulation of a hillslope to determine whether it is capable of representing the known complex behaviour of the system. Our findings are that the approach is in principle feasible. We compared the use of a deep neural network and a random forest. Both can achieve great fitting precision, although the latter performs much faster and requires less training data. Even though the machine learning based time transfer function shows differences in the rates of change in system state from those calculated using expert knowledge in Karssenberg et al. (2017), forward simulation appeared to be possible with system behaviour generally in line with that observed in the data from the hyper resolution model. Our findings indicate that discovery of time transfer functions from data is possible. Next steps need to involve the use observational data (e.g., from remote sensing) to test the approach using data from real-world systems.

 

Karssenberg, D., Bierkens, M.F.P., Rietkerk, M., Catastrophic Shifts in Semiarid Vegetation-Soil Systems May Unfold Rapidly or Slowly. The American Naturalist 2017. Vol. 190, pp. E145–E155.

How to cite: Pomarol Moya, O. and Karssenberg, D.: Machine learning for data driven discovery of time transfer functions in numerical modelling: simulating catastrophic shifts in vegetation-soil systems, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4454, https://doi.org/10.5194/egusphere-egu23-4454, 2023.

EGU23-4695 | Posters on site | ITS1.13/AS5.2

Development of PBL Parameterization Emulator using Neural Networks 

Jiyeon Jang, Tae-Jin Oh, Sojung An, Wooyeon Park, Inchae Na, and Junghan Kim

Physical parameterization is one of the major components of Numerical Weather Prediction system. In Korean Integrated Model (KIM), physical parameterizations account for about 30 % of the total computation time. There are many studies of developing neural network based emulators to replace and accelerate physics based parameterization. In this study, we develop a planetary boundary layer(PBL) emulator which is based on Shin-Hong (Hong et al., 2006, 2010; Shin and Hong, 2013, 2015) scheme that computes the parameterized effects of vertical turbulent eddy diffusion of momentum, water vapor, and sensible heat fluxes. We compare the emulator performance with Multi-Layer Perceptron (MLP) based architectures: simple MLP, MLP application version, and MLP-mixer(Tolstikhin et al., 2021). MLP application version divides data into several vertical groups for better approximation of each vertical group layers. MLP-mixer is MLP based architecture that performs well in computer vision without using convolution and self-attention. We evaluate the resulting MLP based emulator performance. MLP application version and MLP-mixer showed significant performance improvement over simple MLP.

How to cite: Jang, J., Oh, T.-J., An, S., Park, W., Na, I., and Kim, J.: Development of PBL Parameterization Emulator using Neural Networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4695, https://doi.org/10.5194/egusphere-egu23-4695, 2023.

EGU23-4817 | ECS | Posters on site | ITS1.13/AS5.2

Algorithmic optimisation of key parameters of OpenIFS 

Lauri Tuppi, Madeleine Ekblom, Pirkka Ollinaho, and Heikki Järvinen

Numerical weather prediction models contain parameters that are inherently uncertain and cannot be determined exactly. Traditionally, the parameter tuning has been done manually, which can be an extremely labourious task. Tuning the entire model usually requires adjusting a relatively large amount of parameters. In case of manual tuning, the need to balance a number of requirements at the same time can lead the tuning process being a maze of subjective choices. It is, therefore, desirable to have reliable objective approaches for estimation of optimal values and uncertainties of these parameters. In this presentation we present how to optimise 20 key physical parameters having a strong impact on forecast quality. These parameters belong to the Stochastically Perturbed Parameters Scheme in the atmospheric model Open Integrated Forecasting System.

The results show that simultaneous optimisation of O(20) parameters is possible with O(100) algorithm steps using an ensemble of O(20) members, and that the optimised parameters lead to substantial enhancement of predictive skill. The enhanced predictive skill can be attributed to reduced biases in low-level winds and upper-tropospheric humidity in the optimised model. We find that the optimisation process is dependent on the starting values of the parameters that are optimised (starting from better suited values results in a better model). The results also show that the applicability of the tuned parameter values across different model resolutions is somewhat questionable since the model biases seem to be resolution-specific. Moreover, our optimisation algorithm tends to treat the parameter covariances poorly limiting its ability to converge to the global optimum.

How to cite: Tuppi, L., Ekblom, M., Ollinaho, P., and Järvinen, H.: Algorithmic optimisation of key parameters of OpenIFS, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4817, https://doi.org/10.5194/egusphere-egu23-4817, 2023.

EGU23-5003 | ECS | Posters on site | ITS1.13/AS5.2

Towards machine-learning calibration of cloud parameters in the kilometre-resolution ICON atmosphere model 

Hannah Marie Eichholz, Jan Kretzschmar, Duncan Watson-Parris, Josefine Umlauft, and Johannes Quaas

In the preparation of the global kilometre-resolution coupled ICON climate model, it is necessary to calibrate cloud microphysical parameters. Here we explore the avenue towards optimally calibrating such parameters using machine learning. The emulator developed by Watson-Parris et al. (2021) is employed in combination with a perturbed-parameter ensemble of limited-area atmosphere-only ICON simulations for the North Atlantic ocean. In a first step, the autoconversion scaling parameter is calibrated, using satellite-retrieved top-of-atmosphere and bottom-of-atmosphere radiation fluxes. For this purpose, limited area simulations of the north atlantic are performed with ICON. In which different cloud microphysical parameters are changed, in order to evaluate possible influences on the output of radiation fluxes.

How to cite: Eichholz, H. M., Kretzschmar, J., Watson-Parris, D., Umlauft, J., and Quaas, J.: Towards machine-learning calibration of cloud parameters in the kilometre-resolution ICON atmosphere model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5003, https://doi.org/10.5194/egusphere-egu23-5003, 2023.

EGU23-5149 | ECS | Posters on site | ITS1.13/AS5.2

Machine Learning Parameterization for Super-droplet Cloud Microphysics Scheme 

Shivani Sharma and David Greenberg

Machine learning approaches have been widely used for improving the representation of subgrid scale parameterizations in Earth System Models. In our study we target the Cloud Microphysics parameterization, in particular the two-moment bulk scheme of the ICON (Icosahedral Non-hydrostatic) Model. 

 

Cloud microphysics parameterization schemes suffer from an accuracy/speed tradeoff. The simplest schemes, often heavy with assumptions (such as the bulk moment schemes) are most common in operational weather prediction models. Conversely, the more complex schemes with fewer assumptions –e.g. Lagrangian schemes such as the super-droplet method (SDM)– are computationally expensive and used only within research and development. SDM allows easy representation of complex scenarios with multiple hydrometeors and can also be used for simulating cloud-aerosol interactions. To bridge this gap and to make the use of more complex microphysical schemes feasible within operational models, we use a data-driven approach. 

 

Here we train a neural network to mimic the behavior of SDM simulations in a warm-rain scenario in a dimensionless control volume. The network behaves like a dynamical system that converts cloud droplets to rain droplets–represented as bulk moments–with only the current system state as the input. We use a multi-step training loss to stabilize the network over long integration periods, especially in cases with extremely low cloud water to start with. We find that the network is stable across various initial conditions and in many cases, emulates the SDM simulations better than the traditional bulk moment schemes. Our network also performs better than any previous ML-based attempts to learn from SDM. This opens the possibility of using the trained network as a proxy for imitating the computationally expensive SDM within operational weather prediction models with minimum computational overhead. 

How to cite: Sharma, S. and Greenberg, D.: Machine Learning Parameterization for Super-droplet Cloud Microphysics Scheme, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5149, https://doi.org/10.5194/egusphere-egu23-5149, 2023.

EGU23-5523 | ECS | Orals | ITS1.13/AS5.2

Using weak constrained neural networks to improve simulations in the gray zone 

Yvonne Ruckstuhl, Raphael Kriegmair, Stephan Rasp, and George Craig

Machine learning represents a potential method to cope with the gray zone problem of representing motions in dynamical systems on scales comparable to the model resolution. Here we explore the possibility of using a neural network to directly learn the error caused by unresolved scales. We use a modified shallow water model which includes highly nonlinear processes mimicking atmospheric convection. To create the training dataset, we run the model in a high- and a low-resolution setup and compare the difference after one low-resolution time step, starting from the same initial conditions, thereby obtaining an exact target. The neural network is able to learn a large portion of the difference when evaluated on single time step predictions on a validation dataset. When coupled to the low-resolution model, we find large forecast improvements up to 1 d on average. After this, the accumulated error due to the mass conservation violation of the neural network starts to dominate and deteriorates the forecast. This deterioration can effectively be delayed by adding a penalty term to the loss function used to train the ANN to conserve mass in a weak sense. This study reinforces the need to include physical constraints in neural network parameterizations.

How to cite: Ruckstuhl, Y., Kriegmair, R., Rasp, S., and Craig, G.: Using weak constrained neural networks to improve simulations in the gray zone, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5523, https://doi.org/10.5194/egusphere-egu23-5523, 2023.

EGU23-5766 | ECS | Orals | ITS1.13/AS5.2

Best Practices for Fortran-Python Bridges to Integrate Neural Networks in Earth System Models 

Caroline Arnold, Shivani Sharma, Tobias Weigel, and David Greenberg

In recent years, machine learning (ML) based parameterizations have become increasingly common in Earth System Models (ESM). Sub-grid scale physical processes that would be computationally too expensive, e.g., atmospheric chemistry and cloud microphysics, can be emulated by ML algorithms such as neural networks.

Neural networks are trained first on simulations of the sub-grid scale process that is to be emulated. They are then used in so-called inference mode to make predictions during the ESM run, replacing the original parameterization. Training usually requires GPUs, while inference may be done on CPU architectures.

At first, neural networks are evaluated offline, i.e., independently of the ESM on appropriate datasets. However, their performance can ultimately only be evaluated in an online setting, where the ML algorithm is coupled to the ESM, including nonlinear interactions.

We want to shorten the time spent in neural network development and offline testing and move quickly to online evaluation of ML components in our ESM of choice, ICON (Icosahedral Nonhydrostatic Weather and Climate Model). Since ICON is written in Fortran, and modern ML algorithms are developed in the Python ecosystem, this requires efficient bridges between the two programming languages. The Fortran-Python bridge must be flexible to allow for iterative development of the neural network. Changes to the ESM codebase should be as few as possible, and the runtime overhead should not limit development.

In our contribution we explore three strategies to call the neural network inference from within Fortran using (i) embedded Python code compiled in a dynamic library, (ii) pipes, and (iii) MPI using the ICON coupler YAC. We provide quantitative benchmarks for the proposed Fortran-Python bridges and assess their overall suitability in a qualitative way to derive best practices. The Fortran-Python bridge enables scientists and developers to evaluate ML components in an online setting, and can be extended to other parameterizations and ESMs.

How to cite: Arnold, C., Sharma, S., Weigel, T., and Greenberg, D.: Best Practices for Fortran-Python Bridges to Integrate Neural Networks in Earth System Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5766, https://doi.org/10.5194/egusphere-egu23-5766, 2023.

EGU23-6287 | Orals | ITS1.13/AS5.2

Approximation and Optimization of Atmospheric Simulations in High Spatio-Temporal Resolution with Neural Networks 

Elnaz Azmi, Jörg Meyer, Marcus Strobl, Michael Weimer, and Achim Streit

Accurate forecasts of the atmosphere demand large-scale simulations with high spatio-temporal resolution. Atmospheric chemistry modeling, for example, usually requires solving a system of hundreds of coupled ordinary partial differential equations. Due to the computational complexity, large high performance computing resources are required, which is a challenge as the spatio-temporal resolution increases. Machine learning methods and specially deep learning can offer an approximation of the simulations with some factor of speed-up while using less compute resources. The goal of this study is to investigate the feasibility, opportunities but also challenges and pitfalls of replacing the compute-intensive chemistry of a state-of-the-art atmospheric chemistry model with a trained neural network model to forecast the concentration of trace gases at each grid cell and to reduce the computational complexity of the simulation. In this work, we introduce a neural network model (ICONET) to forecast trace gas concentrations without executing the traditional compute-intensive atmospheric simulations. ICONET is equipped with a multifeature Long Short Term Memory (LSTM) model to forecast atmospheric chemicals iteratively in time. We generated the training and test dataset, our ground truth for ICONET, by execution of an atmospheric chemistry simulation in ICON-ART. Applying the ICONET trained model to forecast a test dataset results in a good fit of the forecast values compared to our ground truth dataset. We discuss appropriate metrics to evaluate the quality of models and present the quality of the ICONET forecasts with RMSE and KGE metrics. The variety in the nature of trace gases limits the model's learning and forecast skills according to the variable. In addition to the quality of the ICONET forecasts, we described the computational efficiency of ICONET as its run time speed-up in comparison to the run time of the ICON-ART simulation. The ICONET forecast showed a speed-up factor of 3.1 over the run time of the atmospheric chemistry simulation of ICON-ART, which is a significant achievement, especially when considering the importance of ensemble simulations.

How to cite: Azmi, E., Meyer, J., Strobl, M., Weimer, M., and Streit, A.: Approximation and Optimization of Atmospheric Simulations in High Spatio-Temporal Resolution with Neural Networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6287, https://doi.org/10.5194/egusphere-egu23-6287, 2023.

EGU23-6836 | ECS | Posters on site | ITS1.13/AS5.2

Parameterising melt at the base of Antarctic ice shelves with a feedforward neural network 

Clara Burgard, Nicolas C. Jourdain, Pierre Mathiot, and Robin Smith

One of the largest sources of uncertainty when projecting the Antarctic contribution to sea-level rise is the ocean-induced melt at the base of Antarctic ice shelves. This is because resolving the ocean circulation and the ice-ocean interactions occurring in the cavity below the ice shelves is computationally expensive.

Instead, for large ensembles and long-term projections of the ice-sheet evolution, ice-sheet models currently rely on parameterisations to link the ocean temperature and salinity in front of ice shelves to the melt at their base. However, current physics-based parameterisations struggle to accurately simulate basal melt patterns.

As an alternative approach, we explore the potential use of a deep feedforward neural network as a basal melt parameterisation. To do so, we train a neural network to emulate basal melt rates simulated by highly-resolved circum-Antarctic ocean simulations. We explore the influence of different input variables and show that the neural network struggles to generalise to ice-shelf geometries unseen during training, while it generalises better on timesteps unseen during training. We also test the parameterisation on separate coupled ocean-ice simulations to assess the neural network’s performance on independent data.  

How to cite: Burgard, C., Jourdain, N. C., Mathiot, P., and Smith, R.: Parameterising melt at the base of Antarctic ice shelves with a feedforward neural network, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6836, https://doi.org/10.5194/egusphere-egu23-6836, 2023.

EGU23-7281 | ECS | Posters on site | ITS1.13/AS5.2

Neural network surrogate models for multiple scattering: Application to OMPS LP simulations 

Michael Himes, Natalya Kramarova, Tong Zhu, Jungbin Mok, Matthew Bandel, Zachary Fasnacht, and Robert Loughman

Retrieving ozone from limb measurements necessitates the modeling of scattered light through the atmosphere.  However, accurately modeling multiple scattering (MS) during retrieval requires excessive computational resources; consequently, operational retrieval models employ approximations in lieu of the full MS calculation.  Here we consider an alternative MS approximation method, where we use radiative transfer (RT) simulations to train neural network models to predict the MS radiances.  We present our findings regarding the best-performing network hyperparameters, normalization schemes, and input/output data structures.  Using RT calculations based on measurements by the Ozone Mapping and Profiling Suite's Limb Profiler (OMPS/LP), we compare the accuracy of these neural-network models with both the full MS calculation as well as the current MS approximation methods utilized during OMPS/LP retrievals.

How to cite: Himes, M., Kramarova, N., Zhu, T., Mok, J., Bandel, M., Fasnacht, Z., and Loughman, R.: Neural network surrogate models for multiple scattering: Application to OMPS LP simulations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7281, https://doi.org/10.5194/egusphere-egu23-7281, 2023.

EGU23-7368 | ECS | Posters on site | ITS1.13/AS5.2

Comparison of Methods for Learning Differential Equations from Data 

Christof Schötz

Some results from the DEEB (Differential Equation Estimation Benchmark) are presented. In DEEB, we compare different machine learning approaches and statistical methods for estimating nonlinear dynamics from data. Such methods constitute an important building block for purely data-driven earth system models as well as hybrid models which combine physical knowledge with past observations.

Specifically, we examine approaches for solving the following problem: Given time-state-observations of a deterministic ordinary differential equation (ODE) with measurement noise in the state, predict the future evolution of the system. Of particular interest are systems with chaotic behavior - like Lorenz 63 - and nonparametric settings, in which the functional form of the ODE is completely unknown (in particular, not restricted to a polynomial of low order). To create a fair comparison of methods, a benchmark database was created which includes datasets of simulated observations from different dynamical systems with different complexity and varying noise levels. The list of methods we compare includes: echo state networks, Gaussian processes, Neural ODEs, SINDy, thin plate splines, and more.

Although some methods consistently perform better than others throughout different datasets, there seems to be no silver bullet.

How to cite: Schötz, C.: Comparison of Methods for Learning Differential Equations from Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7368, https://doi.org/10.5194/egusphere-egu23-7368, 2023.

EGU23-7391 | ECS | Posters on site | ITS1.13/AS5.2

Learning fluid dynamical statistics using stochastic neural networks 

Martin Brolly
Many practical problems in fluid dynamics demand an empirical approach, where statistics estimated from data inform understanding and modelling. In this context data-driven probabilistic modelling offers an elegant alternative to ad hoc estimation procedures. Probabilistic models are useful as emulators, but also offer an attractive means of estimating particular statistics of interest. In this paradigm one can rely on proper scoring rules for model comparison and validation, and invoke Bayesian statistics to obtain rigorous uncertainty quantification. Stochastic neural networks provide a particularly rich class of probabilistic models, which, when paired with modern optimisation algorithms and GPUs, can be remarkably efficient. We demonstrate this approach by learning the single particle transition density of ocean surface drifters from decades of Global Drifter Program observations using a Bayesian mixture density network. From this we derive maps of various displacement statistics and corresponding uncertainty maps. Our model also offers a means of simulating drifter trajectories as a discrete-time Markov process, which could be used to study the transport of plankton or plastic in the upper ocean.

How to cite: Brolly, M.: Learning fluid dynamical statistics using stochastic neural networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7391, https://doi.org/10.5194/egusphere-egu23-7391, 2023.

EGU23-7492 | Posters on site | ITS1.13/AS5.2

Machine Learning and Microseism as a Tool for Sea Wave Monitoring 

Flavio Cannavo', Vittorio Minio, Susanna Saitta, Salvatore Alparone, Alfio Marco Borzì, Andrea Cannata, Giuseppe Ciraolo, Danilo Contrafatto, Sebastiano D’Amico, Giuseppe Di Grazia, and Graziano Larocca

Monitoring the state of the sea is a fundamental task for economic activities in the coastal zone, such as transport, tourism and infrastructure design. In recent years, regular wave height monitoring for marine risk assessment and mitigation has become unavoidable as global warming impacts in more intense and frequent swells.
In particular, the Mediterranean Sea has been considered as one of the most responsive regions to global warming, which may promote the intensification of hazardous natural phenomena as strong winds, heavy precipitation and high sea waves. Because of the high density population along the Mediterranean coastlines, heavy swells could have major socio-economic consequences. To reduce the impacts of such scenarios, the development of more advanced monitoring systems of the sea state becomes necessary.
In the last decade, it has been demonstrated how seismometers can be used to measure sea conditions by exploiting the characteristics of a part of the seismic signal called microseism. Microseism is the continuous seismic signal recorded in the frequency band of 0.05 and 0.4 Hz that is likely generated by interactions of sea waves together and with seafloor or shorelines.
In this work, in the framework of i-WaveNET INTERREG project, we performed a regression analysis to develop a model capable of predicting the sea state in the Sicily Channel (Italy) using microseism, acquired by onshore instruments installed in Sicily and Malta. Considering the complexity of the relationship between spatial sea wave height data and seismic data measured at individual stations, we used supervised machine learning (ML) techniques to develop the prediction model. As input data we used the hourly Root Mean Squared (RMS) amplitude of the seismic signal recorded by 14 broadband stations, along the three components, and in different frequency bands, during 2018 - 2021. These stations, belonging to the permanent seismic networks managed by the National Institute of Geophysics and Volcanology INGV and the Department of Geosciences of the University of Malta, consist of three-component broadband seismometers that record at a sampling frequency of 100 Hz.
As for the target, the significant sea wave height data from Copernicus Marine Environment Monitoring Service (CMEMS) for the same period were used. Such data is the hindcast product of the Mediterranean Sea Waves forecasting system, with hourly temporal resolution and 1/24° spatial resolution. After a feature selection step, we compared three different kinds of ML algorithms for regression: K-Nearest-Neighbors (KNN), Random Forest (RF) and Light Gradient Boosting (LGB). The hyperparameters were tuned by using a grid-search algorithm, and the best models were selected by cross-validation.  Different metrics, such as MAE, R2 and RMSE, were considered to evaluate the generalization capabilities of the models and special attention was paid to evaluate the predictive ability of the models for extreme wave height values.
Results show model predictive capabilities good enough to develop a sea monitoring system to complement the systems currently in use.

How to cite: Cannavo', F., Minio, V., Saitta, S., Alparone, S., Borzì, A. M., Cannata, A., Ciraolo, G., Contrafatto, D., D’Amico, S., Di Grazia, G., and Larocca, G.: Machine Learning and Microseism as a Tool for Sea Wave Monitoring, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7492, https://doi.org/10.5194/egusphere-egu23-7492, 2023.

EGU23-7561 | ECS | Posters on site | ITS1.13/AS5.2

Deep Learning guided statistical downscaling of climate projections for use in hydrological impact modeling in Danish peatlands 

Thea Quistgaard, Peter L. Langen, Tanja Denager, Raphael Schneider, and Simon Stisen

A course of action to combat the emission of greenhouse gasses (GHG) in a Danish context is to re-wet previously drained peatlands and thereby return them to their natural hydrological state acting as GHG sinks. GHG emissions from peatlands are known to be closely coupled to the hydrological dynamics through the groundwater table depth (WTD). To understand the effect of a changing and variable climate on the spatio-temporal dynamics of hydrological processes and the associated uncertainties, we aim to produce a high-resolution local-scale climate projection ensemble from the global-scale CMIP6 projections.

With focus on hydrological impacts, uncertainties and possible extreme endmembers, this study aims to span the full ensemble of local-scale climate projections in the Danish geographical area corresponding to the CMIP6-ensemble of Global Climate Models (GCMs). Deep learning founded statistical downscaling methods are applied bridge the gap from GCMs to local-scale climate change and variability, which in turn will be used in field-scale hydrological modeling. The approach is developed to specifically accommodate the resolutions, event types and conditions relevant for assessing the impacts on peatland GHG emissions through their relationship with WTD dynamics by applying stacked conditional generative adversarial networks (CGANs) to best downscale precipitation, temperature, and evaporation. In the future, the approach is anticipated to be extended to directly assess the impacts of climate change and ensemble uncertainty on peatland hydrology variability and extremes.

How to cite: Quistgaard, T., Langen, P. L., Denager, T., Schneider, R., and Stisen, S.: Deep Learning guided statistical downscaling of climate projections for use in hydrological impact modeling in Danish peatlands, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7561, https://doi.org/10.5194/egusphere-egu23-7561, 2023.

EGU23-8288 | Orals | ITS1.13/AS5.2

Learning operational altimetry mapping from ocean models 

Quentin Febvre, Ronan Fablet, Julien Le Sommer, Clément Ubelmann, and Simon Benaïchouche

In oceanography, altimetry products are used to measure the height of the ocean surface, and ocean modeling is used to understand and predict the behavior of the ocean. There are two main types of gridded altimetry products: operational sea level products, such as DUACS, which are used for forecasting and reconstruction, and ocean model reanalyses, such as Glorys 12, which are used to forecast seasonal trends and assess physical characteristics. However, advances in ocean modeling do not always directly benefit operational forecast or reconstruction products.

In this study, we investigate the potential for deep learning methods, which have been successfully applied in simulated setups, to leverage ocean modeling efforts for improving operational altimetry products. Specifically, we ask under what conditions the knowledge learned from ocean simulations can be applied to real-world operational altimetry mapping. We consider the impact of simulation grid resolution, observation data reanalysis, and physical processes modeled on the performance of a deep learning model.

Our results show that the deep learning model outperforms current operational methods on a regional domain around the Gulfstream, with a 50km improvement in resolved scale. This improvement has the potential to enhance the accuracy of operational altimetry products, which are used for a range of important applications, such as climate monitoring and understanding mesoscale ocean dynamics.

How to cite: Febvre, Q., Fablet, R., Le Sommer, J., Ubelmann, C., and Benaïchouche, S.: Learning operational altimetry mapping from ocean models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8288, https://doi.org/10.5194/egusphere-egu23-8288, 2023.

EGU23-9285 | ECS | Orals | ITS1.13/AS5.2

Stabilized Neural Differential Equations for Hybrid Modeling with Conservation Laws 

Alistair White and Niklas Boers

Neural Differential Equations (NDEs) provide a powerful framework for hybrid modeling. Unfortunately, the flexibility of the neural network component of the model comes at the expense of potentially violating known physical invariants, such as conservation laws, during inference. This shortcoming is especially critical for applications requiring long simulations, such as climate modeling, where significant deviations from the physical invariants can develop over time. It is hoped that enforcing physical invariants will help address two of the main barriers to adoption for hybrid models in climate modeling: (1) long-term numerical stability, and (2) generalization to out-of-sample conditions unseen during training, such as climate change scenarios. We introduce Stabilized Neural Differential Equations, which augment an NDE model with compensating terms that ensure physical invariants remain approximately satisfied during numerical simulations. We apply Stabilized NDEs to the double pendulum and Hénon–Heiles systems, both of which are conservative, chaotic dynamical systems possessing a time-independent Hamiltonian. We evaluate Stabilized NDEs using both short-term and long-term prediction tasks, analogous to weather and climate prediction, respectively. Stabilized NDEs perform at least as well as unstabilized models at the “weather prediction” task, that is, predicting the exact near-term state of the system given initial conditions. On the other hand, Stabilized NDEs significantly outperform unstabilized models at the “climate prediction” task, that is, predicting long-term statistical properties of the system. In particular, Stabilized NDEs conserve energy during long simulations and consequently reproduce the long-term dynamics of the target system with far higher accuracy than non-energy conserving models. Stabilized NDEs also remain numerically stable for significantly longer than unstabilized models. As well as providing a new and lightweight method for combining physical invariants with NDEs, our results highlight the relevance of enforcing conservation laws for the long-term numerical stability and physical accuracy of hybrid models.

How to cite: White, A. and Boers, N.: Stabilized Neural Differential Equations for Hybrid Modeling with Conservation Laws, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9285, https://doi.org/10.5194/egusphere-egu23-9285, 2023.

EGU23-10135 | ECS | Orals | ITS1.13/AS5.2

Exploring physics-informed machine learning for accelerated simulation of permafrost processes 

Brian Groenke, Moritz Langer, Guillermo Gallego, and Julia Boike

Permafrost, i.e. ground material that remains perennially frozen, plays a key role in Arctic ecosystems. Monitoring the response of permafrost to rapid climate change remains difficult due to the sparse availability of long-term, high quality measurements of the subsurface. Numerical models are therefore an indispensable tool for understanding the evolution of Arctic permafrost. However, large scale simulation of the hydrothermal processes affecting permafrost is challenging due to the highly nonlinear effects of phase change in porous media. The resulting computational cost of such simulations is especially prohibitive for sensitivity analysis and parameter estimation tasks where a large number of simulations may be necessary for robust inference of quantities such as temperature, water fluxes, and soil properties. In this work, we explore the applicability of recently developed physics-informed machine learning (PIML) methods for accelerating numerical models of permafrost hydrothermal dynamics. We present a preliminary assessment of two possible applications of PIML in this context: (1) linearization of the nonlinear PDE system according to Koopman operator theory in order to reduce the computational burden of large scale simulations, and (2) efficient parameterization of the surface energy balance and snow dynamics on the subsurface hydrothermal regime. By combining the predictive power of machine learning with the underlying conservation laws, PIML can potentially enable researchers and practitioners interested in permafrost to explore complex process interactions at larger spatiotemporal scales.

How to cite: Groenke, B., Langer, M., Gallego, G., and Boike, J.: Exploring physics-informed machine learning for accelerated simulation of permafrost processes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10135, https://doi.org/10.5194/egusphere-egu23-10135, 2023.

EGU23-10256 | ECS | Posters on site | ITS1.13/AS5.2

Foehn Wind Analysis using Unsupervised Deep Anomaly Detection 

Tobias Milz, Marte Hofsteenge, Marwan Katurji, and Varvara Vetrova

Foehn winds are accelerated, warm and dry winds that can have significant environmental impacts as they descend into the lee of a mountain range. For example, in the McMurdo Dry Valleys in Antarctica, foehn events can cause ice and glacial melt and destabilise ice shelves, which if lost, resulting in a rise in sea level. Consequently, there is a strong interest in a deeper understanding of foehn winds and their meteorological signatures. Most current automatic detection methods rely on rule-based methodologies that require static thresholds of meteorological parameters. However, the patterns of foehn winds are hard to define and differ between alpine valleys around the world. Consequently, data-driven solutions might help create more accurate detection and prediction methodologies. 

State-of-the-art machine learning approaches to this problem have shown promising results but follow a supervised learning paradigm. As such, these approaches require accurate labels, which for the most part, are being created by imprecise static rule-based algorithms. Consequently, the resulting machine-learning models are trained to recognise the same static definitions of the foehn wind signatures. 

In this paper, we introduce and compare the first unsupervised machine-learning approaches for detecting foehn wind events. We focus on data from the Mc Murdo Dry Valleys as an example, however, due to the unsupervised nature of these approaches, our solutions can recognise a more dynamic definition of foehn wind events and are therefore, independent of the location. The first approach is based on multivariate time-series clustering, while the second utilises a deep autoencoder-based anomaly detection method to identify foehn wind events. Our best model achieves an f1-score of 88%, matching or surpassing previous machine-learning methods while providing a more flexible and inclusive definition of foehn events. 

How to cite: Milz, T., Hofsteenge, M., Katurji, M., and Vetrova, V.: Foehn Wind Analysis using Unsupervised Deep Anomaly Detection, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10256, https://doi.org/10.5194/egusphere-egu23-10256, 2023.

EGU23-10351 | ECS | Orals | ITS1.13/AS5.2

Deep learning of systematic sea ice model errors from data assimilation increments 

William Gregory, Mitchell Bushuk, Alistair Adcroft, and Yongfei Zhang

Data assimilation is often viewed as a framework for correcting short-term error growth in dynamical climate model forecasts. When viewed on the time scales of climate however, these short-term corrections, or analysis increments, closely mirror the systematic bias patterns of the dynamical model. In this work, we show that Convolutional Neural Networks (CNNs) can be used to learn a mapping from model state variables to analysis increments, thus promoting the feasibility of a data-driven model parameterization which predicts state-dependent model errors. We showcase this problem using an ice-ocean data assimilation system within the fully coupled Seamless system for Prediction and EArth system Research (SPEAR) model at the Geophysical Fluid Dynamics Laboratory (GFDL), which assimilates satellite observations of sea ice concentration. The CNN then takes inputs of data assimilation forecast states and tendencies, and makes predictions of the corresponding sea ice concentration increments. Specifically, the inputs are sea ice concentration, sea-surface temperature, ice velocities, ice thickness, net shortwave radiation, ice-surface skin temperature, and sea-surface salinity. We show that the CNN is able to make skilful predictions of the increments, particularly between December and February in both the Arctic and Antarctic, with average daily spatial pattern correlations of 0.72 and 0.79, respectively. Initial investigation of implementation of the CNN into the fully coupled SPEAR model shows that the CNN can reduce biases in retrospective seasonal sea ice forecasts by emulating a data assimilation system, further suggesting that systematic sea ice biases could be reduced in a free-running climate simulation.

How to cite: Gregory, W., Bushuk, M., Adcroft, A., and Zhang, Y.: Deep learning of systematic sea ice model errors from data assimilation increments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10351, https://doi.org/10.5194/egusphere-egu23-10351, 2023.

Current numerical weather prediction models contain significant systematic errors, due in part to indeterminate ground forcing (GF). This study considers an optimal virtual GF (GFo) derived by training observed and simulated datasets of 10-m wind speeds (WS10) for summer and winter. The GFo is added to an offline surface multilayer model (SMM) to revise predictions of WS10 in China by the Weather Research and Forecasting model (WRF). This revision is a data-based optimization under physical constraints. It reduces WS10 errors and offers wide applicability. The resulting model outperforms two purely physical forecasts (the original WRF forecast and the SMM with physical GF parameterized using urban, vegetation, and subgrid topography) and two purely data-based revisions (i.e., multilinear regression and multilayer perceptron). Compared with original WRF forecasting, using the GFo scheme reduces the Root Mean Square Error (RMSE) in WS10 across China by 25% in summer and 32% in winter. The frontal area index of GFo indicates that it includes both the effects of indeterminate GF and other possible complex physical processes associated with WS10.

How to cite: Feng, J.: Mitigate forecast error in surface wind speed using an offline single-column model with optimal ground forcing, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10394, https://doi.org/10.5194/egusphere-egu23-10394, 2023.

EGU23-10726 | Posters virtual | ITS1.13/AS5.2

A hybrid VMD-WT-InceptionTime model for multi-horizon short-term air temperature forecasting in Alaska 

Jaakko Putkonen, M. Aymane Ahajjam, Timothy Pasch, and Robert Chance

The lack of ground level observation stations outside of settlements makes monitoring and forecasting local weather and permafrost challenging in the Arctic. Such predictive pieces of information are essential to help prepare for potentially hazardous weather conditions, especially during winter. In this study, we aim at enhancing predictive analytics in Alaska of permafrost and temperature by using a hybrid forecasting technique. In particular, we propose VMD-WT-InceptionTime model for short-term air temperature forecasting.

This proposed technique incorporates data preprocessing techniques and deep learning to enhance the accuracy of the next seven days air temperature forecasts. Initially, the Spearman correlation coefficient is utilized to examine the relationship between different inputs and the forecast target temperature. Following this, Variational Mode Decomposition (VMD) is used to decompose the most output-correlated input variables (i.e., temperature and relative humidity) to extract intrinsic and non-stationary time-frequency features from the original sequences. The Wavelet Transform (WT) is then employed to further extract intrinsic multi-resolution patterns from these decomposed input variables. Finally, a deep InceptionTime model is used for multi-step air temperature forecasting using these processed sequences. This forecasting technique was developed using an open dataset holding 20+ years of data from three locations in Alaska: North Slope, Alaska, Arctic National Wildlife Refuge, Alaska, and Diomede Island region, Bering Strait. Model performance has been rigorously evaluated of metrics including RMSE, MAPE and error.

Results highlight the effectiveness of the proposed hybrid model in providing more accurate short-term forecasts than several baselines (GBDT, SVR, ExtraTrees, RF, ARIMA, LSTM, GRU, and Transformer). More specifically, this technique reported RMSE and MAPE average increase rates amounting to 11.21% and 16.13% in North Slope, 30.01% and 34.97% in Arctic National Wildlife Refuge, and 16.39%, 23.46% in Diomede Island region. In addition, the proposed technique produces forecasts over all seven horizons with a maximum error of <1.5K, a minimum error of >-1.2K, and an average error lower than 0.18K for North Slope. For Arctic National Wildlife Refuge, a maximum error of <1K, a minimum error of >-0.9K, and an average of < 0.1K. While a maximum error of <0.9K, a minimum error of >-0.8K, and an average of <0.13K, for Diomede Island region. However, the worst performances achieved were errors of around 6K in the third horizon (i.e., 3rd day) for North Slope and the Arctic National Wildlife Refuge and the last horizon (i.e., 7th day) for the Diomede Islands region. Most of the worst performances of the proposed technique in all three locations can be attributed to having to produce forecasts of higher variations and wider temperature ranges than their averages.

Overall, this research highlights the potential of the decomposition techniques and deep learning to: 1) reveal and effectively learn the underlying cyclicity of air temperatures at varying resolutions that allows for accurate predictions without any knowledge of the governing physics, 2) produce accurate multi-step temperature forecasts in Arctic climates.

How to cite: Putkonen, J., Ahajjam, M. A., Pasch, T., and Chance, R.: A hybrid VMD-WT-InceptionTime model for multi-horizon short-term air temperature forecasting in Alaska, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10726, https://doi.org/10.5194/egusphere-egu23-10726, 2023.

EGU23-10810 | ECS | Orals | ITS1.13/AS5.2

Oceanfourcast: Emulating Ocean Models with Transformers for Adjoint-based Data Assimilation 

Suyash Bire, Björn Lütjens, Dava Newman, and Chris Hill

Adjoints have become a staple of the oceanic and atmospheric numerical modeling community over the past couple of decades as they are useful for tuning of dynamical models, sensitivity analyses, and data assimilation. One such application is generation of reanalysis datasets, which provide an optimal record of our past weather, climate, and ocean. For example, the state-of-the-art ocean-ice renanalysis dataset, ECCO, is created by optimally combining a numerical ocean model with heterogeneous observations through a technique called data assimilation. Data assimilation in ECCO minimizes the distance between model and observations by calculating adjoints, i.e., gradients of the loss w.r.t. simulation forcing fields (wind and surface heat fluxes). The forcing fields are iteratively updated and the model is rerun until the loss is minimized to ensure that the numerical model does not drastically deviate from the observations. Calculating adjoints, however, either requires  disproportionately high computational resources  or rewriting the dynamical model code to be autodifferentiable. 

Therefore, we ask if deep learning-based emulators can provide fast and accurate adjoints. Ocean data is smooth, high-dimensional, and has complex spatiotemporal correlations. Therefore, as an initial foray into ocean emulators, we leverage a combination of neural operators and transformers. Specifically, we have adapted the FourCastNet architecture, which has successfully emulated ERA5 weather data in seconds rather than hours, to emulate an idealized ocean simulation.

We generated a ground-truth dataset by simulating a double-gyre, an idealized representation of the North Atlantic Ocean, using MITgcm, a state-of-the-art dynamical model. The model was forced by zonal wind at the surface and relaxation to a meridional profile of temperature — warm/cold temperatures at low/high latitudes. This simulation produced turbulent western boundary currents embedded in the large-scale gyre circulation. We performed 4 additional simulations by modifying the magnitude of SST relaxation and wind forcing to introduce diversity in the dataset. From these simulations, we used 4 state variables (meridional and zonal surface velocities, pressure, and temperature) as well as the forcing fields (zonal wind velocity and relaxation SST profile) sampled in 10-day steps. The dataset was split into training, validation, and test datasets such that validation and test datasets were unseen during training. These datasets provide an ideal testbed for evaluating and comparing the performance of data-driven ocean emulators.

We used this data to train and evaluate Oceanfourcast. Our initial results in the following figure show that our model, Oceanfourcast, can successfully predict the streamfunction and pressure for a lead time of 1 month. 

We are currently working on generating adjoints from Oceanfourcast.  We expect the adjoint calculation to require significantly less compute time than that from a full-scale dynamical model like MITgcm.  Our work shows a promising path towards deep-learning augmented data assimilation and uncertainty quantification.

How to cite: Bire, S., Lütjens, B., Newman, D., and Hill, C.: Oceanfourcast: Emulating Ocean Models with Transformers for Adjoint-based Data Assimilation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10810, https://doi.org/10.5194/egusphere-egu23-10810, 2023.

EGU23-10904 | ECS | Posters on site | ITS1.13/AS5.2

On the choice of turbulence eddy fluxes to learn from in data-driven methods 

Feier Yan, Julian Mak, and Yan Wang

Recent works have demonstrated the viability of employing data-driven / machine learning 
methods for the purposes of learning more about ocean turbulence, with applications to turbulence parameterisations in ocean general circulation models. Focusing on mesoscale geostrophic turbulence in the ocean context, works thus far have mostly focused on the choice of algorithms and testing of trained up models. Here we focus instead on the choice of eddy flux data to learn from. We argue that, for mesoscale geostrophic turbulence, it might be beneficial from a theoretical as well as practical point of view to learn from eddy fluxes with dynamically inert rotational fluxes removed (ideally in a gauge invariant fashion), instead of the divergence of the eddy fluxes as has been considered thus far. Outlooks for physically constrained and interpretable machine learning will be given in light of the results. 

How to cite: Yan, F., Mak, J., and Wang, Y.: On the choice of turbulence eddy fluxes to learn from in data-driven methods, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10904, https://doi.org/10.5194/egusphere-egu23-10904, 2023.

EGU23-10959 | Orals | ITS1.13/AS5.2

Deep learning parameterization of small-scale vertical velocity variability for atmospheric models 

Donifan Barahona, Katherine Breen, and Heike Kalesse-Los

Small-scale fluctuations in vertical wind velocity, unresolved by climate and weather forecast models play a particularly important role in determining vapor and tracer fluxes, turbulence and cloud formation. Fluctuations in vertical wind velocity are challenging to represent since they depend on orography, large scale circulation features, convection and wind shear. Parameterizations developed using data retrieved at specific locations typically lack generalization and may introduce error when applied on a wide range of different conditions. Retrievals of vertical wind velocity are also difficult and subject to large uncertainty. This work develops a new data-driven, neural network representation of subgrid scale variability in vertical wind velocity. Using a novel deep learning technique, the new parameterization merges data from high-resolution global cloud resolving model simulations with high frequency Radar and Lidar retrievals.  Our method aims to reproduce observed statistics rather than fitting individual measurements. Hence it is resilient to experimental uncertainty and robust to generalization. The neural network parameterization can be driven by weather forecast and reanalysis products to make real time estimations. It is shown that the new parameterization generalizes well outside of the training data and reproduces much better the statistics of vertical wind velocity than purely data-driven models.

How to cite: Barahona, D., Breen, K., and Kalesse-Los, H.: Deep learning parameterization of small-scale vertical velocity variability for atmospheric models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10959, https://doi.org/10.5194/egusphere-egu23-10959, 2023.

EGU23-11293 | ECS | Posters on site | ITS1.13/AS5.2

National scale agricultural development dynamics under socio-political drivers in Saudi Arabia since 1990 

Ting Li, Oliver López Valencia, Kasper Johansen, and Matthew McCabe

Driven in large part by policy initiatives designed to increase food security and realized via the construction of thousands of center-pivot irrigation fields since the 1970s, agriculture development in Saudi Arabia has undergone tremendous changes. However, little is known about the accurate number, acreage, and the changing dynamics of the fields. To bridge the knowledge gap between the political drivers and in-field response, we leveraged a hybrid machine learning framework by implementing Density-Based Spatial Clustering of Applications with Noise, Convolutional Neural Networks, and Spectral Clustering in a stepwise manner to delineate the center-pivot fields on a national scale in Saudi Arabia using historical Landsat imagery since 1990. The framework achieved producer's and user's accuracies larger than  83.7% and 90.2%, respectively, when assessed against 28,000 manually delineated fields collected from different regions and periods. We explored multi-decadal dynamics of the agricultural development in Saudi Arabia by quantifying the number, acreage, and size distribution of center-pivot fields, along with the first and last detection year of the fields since 1990. The agricultural development in Saudi Arabia experienced four stages, including an initialization stage before 1990, a contraction stage from 1990 to 2010, an expansion stage from 2010 to 2016, and an ongoing contraction stage since 2016. Most of the fields predated 1990, representing over 8,800 km2 in that year, as a result of the policy initiatives to stimulate wheat production, promoting Saudi Arabia as the sixth largest exporter of wheat in the 1980s. A decreasing trend was observed from 1990 to 2010, with an average of 8,011 km2 of fields detected during those two decades, which was a response to the policy initiative implemented to phase-out wheat after 1990. As a consequence of planting fodder crops to promote the dairy industry, the number and extent of fields increased rapidly from 2010 to 2015 and reached its peak in 2016, with 33,961 fields representing 9,400 km2. Agricultural extent has seen a continuous decline since 2016 to a level lower than 1990 values in 2020. This decline has been related to sustainable policy initiatives implemented for the Saudi Vision 2030. There is some evidence of an uptick in 2021 — also observed in an ongoing analysis for 2022 — which might be in response to global influences, such as the COVID-19 pandemic and the more recent conflict in the Ukraine, which has disrupted the international supply of agricultural products. The results provide a historical account of agricultural activity throughout the Kingdom and provide a basis for informed decision-making on sustainable irrigation and agricultural practices, helping to better protect and manage the nation's threatened groundwater resources, and providing insights into the resilience and elasticity of the Saudi Arabian food system to global perturbations.

How to cite: Li, T., López Valencia, O., Johansen, K., and McCabe, M.: National scale agricultural development dynamics under socio-political drivers in Saudi Arabia since 1990, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11293, https://doi.org/10.5194/egusphere-egu23-11293, 2023.

EGU23-11687 | ECS | Orals | ITS1.13/AS5.2

Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning 

Carola Trahms, Yannick Wölker, and Arne Biastoch

Determining the number of existing water masses and defining their boundaries is subject to ongoing discussion in physical oceanography. Traditionally, water masses are defined manually by experts setting constraints based on experience and previous knowledge about the hydrographic properties describing them. In recent years, clustering, an unsupervised machine learning approach, has been introduced as a tool to determine clusters, i.e., volumes, with similar hydrographic properties without explicitly defining their hydrographic constraints. However, the exact number of clusters to be looked for is set manually by an expert up until now. 

We propose a method that determines a fitting number of clusters for hydrographic clusters in a data driven way. In a first step, the method averages the data in different-sized slices along the time or depth axis as the structure of the hydrographic space changes strongly either in time or depth. Then the method applies clustering algorithms on the averaged data and calculates off-the-shelf evaluation scores (Davies-Bouldin, Calinski-Harabasz, Silhouette Coefficient) for several predefined numbers of clusters. In the last step, the optimal number of clusters is determined by analyzing the cluster evaluation scores across different numbers of clusters for optima or relevant changes in trend. 

For validation we applied this method to the output for the subpolar North Atlantic between 1993 and 1997 of the high-resolution Atlantic Ocean model VIKING20X, in direct exchange with domain experts to discuss the resulting clusters. Due to the change from strong to weak deep convection in these years, the hydrographic properties vary strongly in the time and depth dimension, providing a specific challenge to our methodology. 

Our findings suggest that it is possible to identify an optimal number of clusters using the off-the-shelf cluster evaluation scores that catch the underlying structure of the hydrographic space. The optimal number of clusters identified by our data-driven method agrees with the optimal number of clusters found by expert interviews. These findings contribute to aiding and objectifying water mass definitions across multiple expert decisions, and demonstrate the benefit of introducing data science methods to analyses in physical oceanography.

How to cite: Trahms, C., Wölker, Y., and Biastoch, A.: Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11687, https://doi.org/10.5194/egusphere-egu23-11687, 2023.

EGU23-11906 | ECS | Orals | ITS1.13/AS5.2

Untapping the potential of geostationary EO data to understand drought impacts with XAI 

Basil Kraft, Gregory Duveiller, Markus Reichstein, and Martin Jung

Ecosystems are affected by extreme climate conditions such as droughts worldwide but we still lack understanding of the involved dynamics. Which factors render an ecosystem more resilient, and on which temporal scales do weather patterns affect vegetation state and physiology? Traditional approaches to tackle such questions involve assumption-based land surface modeling or inversions. Machine learning (ML) methods can provide a complementary perspective on how ecosystems respond to climate in a more data-driven and assumption-free manner. However, ML depends heavily on data, and commonly used observations of vegetation at best contain one observation per day, but most products are provided at 16-daily to monthly temporal resolution. This masks important processes at sub-monthly time scales. In addition, ML models are inherently difficult to interpret, which still limits their applicability for process understanding.

In the present study, we combine modern deep learning models in the time domain with observations from the geostationary Meteosat Second Generation (MSG) satellite, centered over Africa. We model fractional vegetation cover (representing vegetation state) and land surface temperature (as a proxy for water stress) from MSG as a function of meteorology and static geofactors. MSG collects observations at sub-daily frequency, rendering it into an excellent tool to study short- to mid-term land surface processes. Furthermore, we use methods from explainable ML for post-hoc model interpretation to identify meteorological drivers of vegetation dynamics and their interaction with key geofactors.

From the analysis, we expect to gather novel insights into ecosystem response to droughts with high temporal fidelity. Drought response of vegetation can be highly diverse and complex especially in arid to semi-arid regions prevalent in Africa. Also, we assess the potential of explainable machine learning to discover new linkages and knowledge and discuss potential pitfalls of the approach. Explainable machine learning, combined with potent deep learning approaches and modern Earth observation products offers the opportunity to complement assumption-based modeling to predict and understand ecosystem response to extreme climate.

How to cite: Kraft, B., Duveiller, G., Reichstein, M., and Jung, M.: Untapping the potential of geostationary EO data to understand drought impacts with XAI, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11906, https://doi.org/10.5194/egusphere-egu23-11906, 2023.

EGU23-11958 | ECS | Posters on site | ITS1.13/AS5.2

Modelling Soil Temperature and Soil Moisture in Space, Depth, and Time with Machine Learning Techniques 

Maiken Baumberger, Linda Adorf, Bettina Haas, Nele Meyer, and Hanna Meyer

Soil temperature and soil moisture variations have large effects on ecological processes in the soil. To investigate and understand these processes, high-resolution data of soil temperature and soil moisture are required. Here, we present an approach to generate data of soil temperature and soil moisture continuously in space, depth, and time for a 400 km² study area in the Fichtel Mountains (Germany). As reference data, measurements with 1 m long soil probes were taken. To cover many different locations, the available 15 soil probes were shifted regularly in the course of one year. With this approach, around 250 different locations in forest sites, on meadows and on agricultural fields were captured under a variety of meteorological conditions. These measurements are combined with readily available meteorological data, satellite data and soil maps in a machine learning approach to learn the complex relations between these variables. We aim for a model which can predict the soil temperature and soil moisture continuously for our study area in the Fichtel Mountains, with a spatial resolution of 10 m x 10 m, down to 1 m depth with segments of 10 cm each and in an hourly resolution in time. Here, we present the results of our pilot study where we focus on the temperature and moisture change within the depth down to 1 m at one single location. To take temporal lags into account, we construct a Long Short-Term Memory network based on meteorological data as predictors to make temperature and moisture predictions in time and depth. The results indicate a high ability of the model to reproduce the time series of the single location and highlight the potential of the approach for the space-time-depth mapping of soil temperature and soil moisture.

How to cite: Baumberger, M., Adorf, L., Haas, B., Meyer, N., and Meyer, H.: Modelling Soil Temperature and Soil Moisture in Space, Depth, and Time with Machine Learning Techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11958, https://doi.org/10.5194/egusphere-egu23-11958, 2023.

EGU23-12218 | Posters on site | ITS1.13/AS5.2

Bias correction of aircraft temperature observations in the Korean Integrated Model based on a deep learning approach 

Hui-nae Kwon, Hyeon-ju Jeon, Jeon-ho Kang, In-hyuk Kwon, and Seon Ki Park

The aircraft-based observation is one of the important anchor data used in the numerical weather prediction (NWP) models. Nevertheless, the bias has been noted in the temperature observation through several previous studies. As the performance on the hybrid four-dimensional ensemble variational (hybrid-4DEnVar) data assimilation (DA) system of the Korean Integrated Model (KIM) ⸺ the operational model in the Korea Meteorological Administration (KMA) ⸺ has been advanced, the need for the aircraft temperature bias correction (BC) has been confirmed. Accordingly, as a preliminary study on the BC, the static BC method based on the linear regression was applied to the KIM Package for Observation Processing (KPOP) system. However, the results showed there were limitations of a spatial discontinuity and a dependency on the calculation period of BC coefficients.

In this study, we tried to develop the machine learning-based bias estimation model to overcome these limitations. The MultiLayer Perceptron (MLP) based learning was performed to consider the vertical, spatial and temporal characteristics of each observation by flight IDs and phases, and at the same time to consider the correlation among observation variables. As a result of removing the predicted bias from the bias estimation model, the mean of the background innovation (O-B) decreases from 0.2217 K to 0.0136 K in a given test period. Afterwards, in order to verify the analysis field impact for BC, the bias estimation model will be grafted onto the KPOP system and then several DA cycle experiments will be conducted in the KIM.

How to cite: Kwon, H., Jeon, H., Kang, J., Kwon, I., and Park, S. K.: Bias correction of aircraft temperature observations in the Korean Integrated Model based on a deep learning approach, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12218, https://doi.org/10.5194/egusphere-egu23-12218, 2023.

EGU23-12355 | ECS | Orals | ITS1.13/AS5.2

Comparison of NWP Models Used in Training Surrogate Wave Models 

Ajit Pillai, Ian Ashton, Jiaxin Chen, and Edward Steele

Machine learning is increasingly being applied to ocean wave modelling. Surrogate modelling has the potential to reduce or bypass the large computational requirements, creating a low computational-cost model that offers a high level of accuracy. One approach integrates in-situ measurements and historical model runs to achieve the spatial coverage of the model and the accuracy of the in-situ measurements. Once operational, such a system requires very little computational power, meaning that it could be deployed to a mobile phone, operational vessel, or autonomous vessel to give continuous data. As such, it makes a significant change to the availability of met-ocean data with potential to revolutionise data provision and use in marine and coastal settings.

This presentation explores the impact that an underlying physics-based model can have in such a machine learning driven framework; comparing training the system on a bespoke regional SWAN wave model developed for wave energy developments in the South West of the UK against training using the larger North-West European Shelf long term hindcast wave model run by the UK Met Office. The presentation discusses the differences in the underlying NWP models, and the impacts that these have on the surrogate wave models’ accuracy in both nowcasting and forecasting wave conditions at areas of interest for renewable energy developments. The results identify the importance in having a high quality, validated, NWP model for training such a system and the way in which the machine learning methods can propagate and exaggerate the underlying model uncertainties.

How to cite: Pillai, A., Ashton, I., Chen, J., and Steele, E.: Comparison of NWP Models Used in Training Surrogate Wave Models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12355, https://doi.org/10.5194/egusphere-egu23-12355, 2023.

EGU23-12403 | ECS | Orals | ITS1.13/AS5.2

PseudoSpectralNet: A hybrid neural differential equation for atmosphere models 

Maximilian Gelbrecht and Niklas Boers

When predicting complex systems such as parts of the Earth system, one typically relies on differential equations which often can be incomplete, missing unknown influences or include errors through their discretization. To remedy those effects, we present PseudoSpectralNet (PSN): a hybrid model that incorporates both a knowledge-based part of an atmosphere model and a data-driven part, an artificial neural network (ANN). PSN is a neural differential equation (NDE): it defines the right-hand side of a differential equation, combining a physical model with ANNs and is able to train its parameters inside this NDE. Similar to the approach of many atmosphere models, part of the model is computed in the spherical harmonics domain, and other parts in the grid domain. The model consists of ANN layers in each domain, information about derivatives, and parameters such as the orography. We demonstrate the capabilities of PSN on the well-studied Marshall Molteni Quasigeostrophic Model.

How to cite: Gelbrecht, M. and Boers, N.: PseudoSpectralNet: A hybrid neural differential equation for atmosphere models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12403, https://doi.org/10.5194/egusphere-egu23-12403, 2023.

EGU23-12458 | ECS | Posters on site | ITS1.13/AS5.2

Training Deep Data Assimilation Networks on Sparse and Noisy Observations 

Vadim Zinchenko and David Greenberg

Data Assimilation (DA) is a challenging and expensive computational problem targetting hidden variables in high-dimensional spaces. 4DVar methods are widely used in weather forecasting to fit simulations to sparse observations by optimization over numerical model input. The complexity of this inverse problem and the sequential nature of common 4DVar approaches lead to long computation times with limited opportunity for parallelization. Here we propose using machine learning (ML) algorithms to replace the entire 4DVar optimization problem with a single forward pass through a neural network that maps from noisy and incomplete observations at multiple time points to a complete system state estimate at a single time point. We train the neural network using a loss function derived from the weak-constraint 4DVar objective, including terms incorporating errors in both model and data. In contrast to standard 4DVar approaches, our method amortizes the computational investment of training to avoid solving optimization problems for each assimilation window, and its non-sequential nature allows for easy parallelization along the time axis for both training and inference. In contrast to most previous ML-based data assimilation methods, our approach does not require access to complete, noise-free simulations for supervised learning or gradient-free approximations such as Ensemble Kalman filtering. To demonstrate the potential of our approach, we show a proof-of-concept on the chaotic Lorenz'96 system, using a novel "1.5D Unet" architecture combining 1D and 2D convolutions.

How to cite: Zinchenko, V. and Greenberg, D.: Training Deep Data Assimilation Networks on Sparse and Noisy Observations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12458, https://doi.org/10.5194/egusphere-egu23-12458, 2023.

EGU23-12566 | Posters on site | ITS1.13/AS5.2

Comparison of PM2.5 concentrations prediction model performance using Artificial Intelligence 

Kyung-Hui Wang, Chae-Yeon Lee, Ju-Yong Lee, Min-Woo Jung, Dong-Geon Kim, Seung-Hee Han, Dae-Ryun Choi, and Hui-young Yun

Since PM2.5 (particulate matter with an aerodynamic diameter of less than 2.5 µm) directly threatens public health, in order to take appropriate measures(prevention) in advance, the Korea Ministry of Environment(MOE) has been implementing PM10 forecast nationwide since February 2014. PM2.5 forecasts have been implemented nationwide since January 2015. The currently implemented PM forecast by the MOE subdivides the country into 19 regions, and forecasts the level of PM in 4 stages of “Good”, “Moderate”, “Unhealthy”, and “Very unhealthy”.

Currently PM air quality forecasting system operated by the MOE is based on a numerical forecast model along with a weather and emission model. Numerical forecasting model has fundamental limitations such as the uncertainty of input data such as emissions and meteorological data, and the numerical model itself. Recently, many studies on predicting PM using artificial intelligence such as DNN, RNN, LSTM, and CNN have been conducted to overcome the limitations of numerical models.

In this study, in order to improve the prediction performance of the numerical model, past observational data (air quality and meteorological data) and numerical forecasting model data (chemical transport model) are used as input data. The machine learning model consists of DNN and Seq2Seq, and predicts 3 days (D+0, D+1, D+2) using 6-hour and 1-hour average input data, respectively. The PM2.5 concentrations predicted by the machine learning model and the numerical model were compared with the PM2.5 measurements.

The machine learning models were trained for input data from 2015 to 2020, and their PM forecasting performance was tested for 2021. Compared to the numerical model, the machine learning model tended to increase ACC and be similar or lower to FAR and POD.

Time series trend was showed machine learning PM forecasting trend is more similar to PM measurements compared with numerical model. Especially, machine learning forecasting model can appropriately predict PM low and high concentrations that numerical model is used to overestimate.

Machine learning forecasting model with DNN and Seq2Seq can found improvement of PM forecasting performance compared with numerical forecasting model. However, the machine learning model has limitations that the model can not consider external inflow effects.

In order to overcome the drawback, the models should be updated and added some other machine learning module such as CNN with spatial features of PM concentrations.

 

Acknowledgements

This study was supported in part by the ‘Experts Training Graduate Program for Particulate Matter Management’ from the Ministry of Environment, Korea and by a grant from the National Institute of Environmental Research (NIER), funded by the Ministry of Environment (ME) of the Republic of Korea (NIER-2022-04-02-068).

 

How to cite: Wang, K.-H., Lee, C.-Y., Lee, J.-Y., Jung, M.-W., Kim, D.-G., Han, S.-H., Choi, D.-R., and Yun, H.: Comparison of PM2.5 concentrations prediction model performance using Artificial Intelligence, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12566, https://doi.org/10.5194/egusphere-egu23-12566, 2023.

EGU23-13013 | ECS | Posters on site | ITS1.13/AS5.2

Using cGAN for cloud classification from RGB pictures 

Markus Rosenberger, Manfred Dorninger, and Martin Weißmann

Clouds of all kinds play a large role in many atmospheric processes including, e.g. radiation and moisture transport, and their type allows an insight into the dynamics going on in the atmosphere. Hence, the observation of clouds from Earth's surface has always been important to analyse the current weather and its evolution during the day. However, cloud observations by human observers are labour-intensive and hence also costy. In addition to this, cloud classifications done by human observers are always subjective to some extent. Finding an efficient method for automated observations would solve both problems. Although clouds have already been operationally observed using satellites for decades, observations from the surface shed a light on a different set of characteristics. Moreover, the WMO also defined their cloud classification standards according to visual cloud properties when observations are done at the Earth’s surface. Thus, in this work a utilization of machine learning methods to classify clouds from RGB pictures taken at the surface is proposed. Explicitly, a conditional Generative Adversarial Network (cGAN) is trained to discriminate between 30 different categories, 10 for each cloud level - low, medium and high; Besides showing robust results in different image classification problems, an additional advantage of using a GAN instead of a classical convolutional neural network is that its output can also artificially enhance the size of the training data set. This is especially useful if the number of available pictures is unevenly distributed among the different classes. Additional background observations like cloud cover and cloud base height can also be used to further improve the performance of the cGAN. Together with a cloud camera, a properly trained cGAN can observe and classify clouds with a high temporal resolution of the order of seconds, which can be used, e.g. for model verification or to efficiently monitor the current status of the weather as well as its short-time evolution. First results will also be presented.

How to cite: Rosenberger, M., Dorninger, M., and Weißmann, M.: Using cGAN for cloud classification from RGB pictures, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13013, https://doi.org/10.5194/egusphere-egu23-13013, 2023.

EGU23-13143 | ECS | Posters on site | ITS1.13/AS5.2

Comparison of LSTM, GraphNN, and IrradPhyDNet based Approaches for High-resolution Solar Irradiance Nowcasting 

Petrina Papazek, Irene Schicker, and Pascal Gfähler

With fast parallel computing hardware, particularly GPUs, becoming more accessible in the geosciences the now efficiently running deep learning techniques are ready to handle larger amounts of recorded observation and satellite derived data and are able to learn complex structures across time-series. Thus, a suitable deep learning setup is able to generate highly-resolved weather forecasts in real-time and on demand. Forecasts of irradiance and radiation can be challenging in machine learning as they embrace a high degree of diurnal and seasonal variation.

Continuously extended PV/solar power production grows into one of our most important fossil-fuel free energy sources. Unlike the just recently emerging PV power observations, solar irradiance offers long time-series from automized weather station networks. Being directly linked to PV outputs, forecasting highly resolved solar irradiance from nowcasting to short-range plays a crucial role in decision support and managing PV.

In this study, we investigate the suitability of several deep learning techniques adopted and developed to a set of heterogeneous data sources on selected locations. We compare the forecast results to traditional – however computationally expensive - numerical weather prediction models (NWP) and rapid update cycle models. Relevant input features include 3D-fields from NWP models (e.g.: AROME), satellite data and products (e.g.: CAMS), radiation time series from remote sensing, and observation time time-series (site observations and close sites). The amount of time-series data can be extended by a synthetic data generator, a part of our deep learning framework. Our main models investigated includes a sequence-to-sequence LSTM (long-short-term-memory) model using a climatological background model or NWP for post-processing, a Graph NN model, and an analogs based deep learning method. Furthermore, a novel neural network model based on two other ideas, the IrradianceNet and the PhyDNet, was developed. IrradPhyDNet combines the skills of IrradianceNet and PhyDNet and showed improved performance in comparison to the original models.

Results obtained by the developed methods yield, in general, high forecast-skills. For selected case studies of extreme events (e.g. Saharan dust) all novel methods could outperform the traditional methods.  Different combinations of inputs and processing-steps are part of the analysis.

How to cite: Papazek, P., Schicker, I., and Gfähler, P.: Comparison of LSTM, GraphNN, and IrradPhyDNet based Approaches for High-resolution Solar Irradiance Nowcasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13143, https://doi.org/10.5194/egusphere-egu23-13143, 2023.

EGU23-13322 | ECS | Posters on site | ITS1.13/AS5.2

Nodal Ambient Noise Tomography and automatic picking of dispersion curves with convolutional neural network: case study at Vulcano-Lipari, Italy 

Douglas Stumpp, Elliot Amir Jiwani-Brown, Célia Barat, Matteo Lupi, Francisco Muñoz, Thomas Planes, and Geneviève Savard

The ambient noise tomography (ANT) method is widely adopted to reconstruct shear-wave velocity anomalies and to generate high-resolution images of the crust and upper-mantle. A critical step in this process is the extraction of surface-wave dispersion curves from cross-correlation functions of continuous ambient noise recordings, which is traditionally performed manually on the dispersion spectrograms through human-machine interfaces. Picking of dispersion curves is sometimes prone to bias due to human interpretation. Furthermore, it is a laborious and time-consuming task that needs to be resolved in an automatized manner, especially when dealing with dense seismic network of nodal geophones where the large amount of generated data severely hinders manual picking approaches. In the last decade, several studies successfully employed machine learning methods in Earth Sciences and across many seismological applications. Early studies have shown versatile and reliable solutions by treating dispersion curve extraction as a visual recognition problem. 

We review and adapt a specific machine learning approach, deep convolutional neural networks, for use on dispersion spectrograms generated with the usual frequency-time analysis (FTAN) processing on ambient noise cross-correlations. To train and calibrate the algorithm we use several available datasets acquired from previous experiments across different geological settings. The main dataset consists of records acquired with a dense local geophone network (150 short period stations sampling at 250 Hz) deployed for one month in October 2021. The dataset has been acquired during the volcanic unrest of the Vulcano-Lipari complex, Italy. The network also accounts for additional 17 permanent broadband stations (sampling at 100 Hz) maintained by the National Institute of Geophysics and Volcanology (INGV) in Italy. We evaluate the performance of the dispersion curves extraction algorithm. The automatically-picked dispersion curves will be used to construct a shear-wave velocity model of the Vulcano-Lipari magmatic plumbing system and the surrounding area of the Aeolian archipelago.

 

How to cite: Stumpp, D., Amir Jiwani-Brown, E., Barat, C., Lupi, M., Muñoz, F., Planes, T., and Savard, G.: Nodal Ambient Noise Tomography and automatic picking of dispersion curves with convolutional neural network: case study at Vulcano-Lipari, Italy, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13322, https://doi.org/10.5194/egusphere-egu23-13322, 2023.

EGU23-13367 | ECS | Posters on site | ITS1.13/AS5.2

Framework for creating daily semantic segmentation maps of classified eddies using SLA along-track altimetry data 

Eike Bolmer, Adili Abulaitijiang, Luciana Fenoglio-Marc, Jürgen Kusche, and Ribana Roscher

Mesoscale eddies are gyrating currents in the ocean and have horizontal scales from 10 km up to 100 km and above. They transport water mass, heat, and nutrients and therefore are of interest among others to marine biologists, oceanographers, and geodesists. Usually, gridded sea level anomaly maps, processed from several radar altimetry missions, are used to detect eddies. However, operational processors create multi-mission (processing level 4) SLA grid maps with an effective spatiotemporal resolution far lower than their grid spacing and temporal resolution. 

This drawback leads to erroneous eddy detection. We, therefore, investigate if the higher-resolution along-track data could be used instead to solve the problem of classifying the SLA observations into cyclonic, anticyclonic, or no eddies in a more accurate way than using processed SLA grid map products. With our framework, we aim to infer a daily two-dimensional segmentation map of classified eddies. Due to repeat cycles between 10 and 35 days and cross-track spacing of a few 10 km to a few 100 km, ocean eddies are clearly visible in altimeter observations but are typically covered only by a few ground tracks where the spatiotemporal context within the input data is highly variable each day. However conventional convolutional neural networks (CNNs) rely on data without varying gaps or jumps in time and space in order to use the intrinsic spatial or temporal context of the observations. Therefore, this is a challenge that needs to be addressed with a deep neural network that on the one hand utilizes the spatiotemporal context information within the modality of along-track data and on the other hand is able to output a two-dimensional segmentation map from data of varying sparsity. Our approach with our architecture Teddy is to use a transformer module to encode and process the spatiotemporal information along with the ground track's sea level anomaly data that produces a sparse feature map. This will then be fed into a sparsity invariant convolutional neural network in order to infer a two-dimensional segmentation map of classified eddies. Reference data that is used to train Teddy is produced by an open-source geometry-based approach (py-eddy-tracker [1]). 

The focus of this presentation is on how we implemented this approach in order to derive two-dimensional segmentation maps of classified eddies with our deep neural network architecture Teddy from along-track altimetry. We show results and limitations for the classification of eddies using only along-track SLA data from the multi-mission level 3 product of the Copernicus Marine Environment Monitoring Service (CMEMS) within the 2017 - 2019 period for the Gulf Stream region. We find that using our methodology, we can create two-dimensional maps of classified eddies from along-track data without using preprocessed SLA grid maps.

[1] Evan Mason, Ananda Pascual, and James C. McWilliams, “A new sea surface height–based code for oceanic mesoscale eddy tracking,” Journal of Atmospheric and Oceanic Technology, vol. 31, no. 5, pp. 1181–1188, 2014.

How to cite: Bolmer, E., Abulaitijiang, A., Fenoglio-Marc, L., Kusche, J., and Roscher, R.: Framework for creating daily semantic segmentation maps of classified eddies using SLA along-track altimetry data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13367, https://doi.org/10.5194/egusphere-egu23-13367, 2023.

EGU23-13771 | Orals | ITS1.13/AS5.2

Machine Learning Emulation of 3D Shortwave Radiative Transfer for Shallow Cumulus Cloud Fields 

Jui-Yuan Christine Chiu, Chen-Kuang Kevin Yang, Jake J. Gristey, Graham Feingold, and William I. Gustafson

Clouds play an important role in determining the Earth’s radiation budget. Despite their complex and three-dimensional (3D) structures, their interactions with radiation in models are often simplified to one-dimensional (1D), considering the time required to compute radiative transfer. Such a simplification ignores cloud Inhomogeneity and horizontal photon transport in radiative processes, which may be an acceptable approximation for low-resolution models, but can lead to significant errors and impact cloud evolution predictions in high-resolution simulations. Since model developments and operations are heading toward a higher resolution that is more susceptible to radiation errors, a fast and accurate 3D radiative transfer scheme becomes important and necessary. To address the need, we develop a machine-learning-based 3D radiative transfer emulator to provide surface radiation, shortwave fluxes at all layers, and heating rate profiles. The emulators are trained for highly heterogeneous shallow cumulus under different solar positions. We will discuss the performance of the emulators in accuracy and efficiency and discuss their potential applications.

How to cite: Chiu, J.-Y. C., Yang, C.-K. K., Gristey, J. J., Feingold, G., and Gustafson, W. I.: Machine Learning Emulation of 3D Shortwave Radiative Transfer for Shallow Cumulus Cloud Fields, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13771, https://doi.org/10.5194/egusphere-egu23-13771, 2023.

EGU23-14051 | ECS | Posters on site | ITS1.13/AS5.2

Multi-modal data assimilation of sea surface currents from AIS data streams and satellite altimetry using 4DVARNet 

Simon Benaïchouche, Clément Le Goff, Brahim Boussidi, François Rousseau, and Ronan Fablet

Over the last decades, space oceanography missions, particularly altimeter missions, have greatly advanced our ability to observe sea surface dynamics. However, they still struggle to resolve spatial scales below ~ 100 km. On a global scale, sea surface current are derived from sea surface height by a geostrophical assumption. While future altimeter missions should improve the observation of sea surface height, the observation of sea surface current using altimetry techniques would remains indirect. In the other hands, recent works have considered the use of AIS (automated identification system) as a new mean to reconstruct sea surface current : AIS data streams provide an indirect observational models of total currents including ageostrophic phenomenas. In this work we consider the use of the supervised learning framework 4DVARNet, a supervised data driven approach that allow us to perform multi-modal experiments : We focus on an Observing System Simulation Experiment (OSSE) in a region of the Gulf-Stream and we show that the joint use of AIS and sea surface height (SSH) measurement could improve the reconstruction of sea surface current with respect to product derived solely from AIS or SSH observations in terms of physical and time scale resolved. 

How to cite: Benaïchouche, S., Le Goff, C., Boussidi, B., Rousseau, F., and Fablet, R.: Multi-modal data assimilation of sea surface currents from AIS data streams and satellite altimetry using 4DVARNet, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14051, https://doi.org/10.5194/egusphere-egu23-14051, 2023.

EGU23-15183 | ECS | Orals | ITS1.13/AS5.2

Deep learning approximations of a CFD model for operational wind and turbulence forecasting 

Margrethe Kvale Loe and John Bjørnar Bremnes

The Norwegian Meteorological Institute has for many years applied a CFD model to downscale operational NWP forecasts to 100-200m spatial resolution for wind and turbulence forecasting for about 20 Norwegian airports. Due to high computational costs, however, the CFD model can only be run twice per day, each time producing a 12-hour forecast. An approximate approach requiring far less compute resources using deep learning has therefore been developed. In this, the relation between relevant NWP forecast variables at grids of 2.5 km spatial resolution and wind and turbulence from the CFD model has been approximated using neural networks with basic convolutional and dense layers. The deep learning models have been trained on approximately two year of the data separately for each airport. The results show that the models are to a large extent able to capture the characteristics of their corresponding CDF simulations, and the method is in due time intended to fully replace the current operational solution. 

How to cite: Loe, M. K. and Bremnes, J. B.: Deep learning approximations of a CFD model for operational wind and turbulence forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15183, https://doi.org/10.5194/egusphere-egu23-15183, 2023.

EGU23-15684 | ECS | Posters on site | ITS1.13/AS5.2

Semi-supervised feature-based learning for prediction of Mass Accumulation Rate of sediments 

Naveenkumar Parameswaran, Everardo Gonzalez, Ewa Bur­wicz-Ga­ler­ne, David Greenberg, Klaus Wallmann, and Malte Braack

Mass accumulation rates of sediments[g/cm2/yr] or sedimentation rates[cm/yr] on the seafloor are important to understand various benthic properties, like the rate of carbon sequestration in the seafloor and seafloor geomechanical stability. Several machine learning models, such as random forests, and k-Nearest Neighbours have been proposed for the prediction of geospatial data in marine geosciences, but face significant challenges such as the limited amount of labels for training purposes, skewed data distribution, a large number of features etc. Previous model predictions show deviation in the global sediment budget, a parameter used to determine a model's predicitve validity, revealing the lack of accurate representation of sedimentation rate by the state of the art models. 

Here we present a semi-supervised deep learning methodology to improve the prediction of sedimentation rates, making use of around 9x106  unlabelled data points. The semi-supervised neural network implementation has two parts: an unsupervised pretraining using an encoder-decoder network. The encoder with the optimized weights from the unsupervised training is then taken out and fitted with layers that lead to the target dimension. This network is then fine-tuned with 2782 labelled data points, which are observed sedimentation rates from peer-reviewed sources. The fine-tuned model then predicts the rate and quantity of sediment accumulating on the ocean floor, globally.

The developed semi-supervised neural network provide better predictions than supervised models trained only on labelled data. The predictions from the semi-supervised neural network are compared with that of the supervised neural network with and without dimensionality reduction(using Principle Component Analysis).

How to cite: Parameswaran, N., Gonzalez, E., Bur­wicz-Ga­ler­ne, E., Greenberg, D., Wallmann, K., and Braack, M.: Semi-supervised feature-based learning for prediction of Mass Accumulation Rate of sediments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15684, https://doi.org/10.5194/egusphere-egu23-15684, 2023.

EGU23-15756 | ECS | Posters on site | ITS1.13/AS5.2

Physiography improvements in numerical weather prediction digital twin engines 

Thomas Rieutord, Geoffrey Bessardon, and Emily Gleeson

The next generation of numerical weather prediction model (so-called digital twin engines) will reach hectometric scale, for which the existing physiography databases are insufficient. Our work leverages machine learning and open-access data to produce a more accurate and higher resolution physiography database. One component to improve is the land cover map. The reference data gathers multiple high-resolution thematic maps thanks to an agreement-based decision tree. The input data are taken from the Sentinel-2 satellite. Then, the land cover map generation is made with image segmentation. This work implements and compares several algorithms of different families to study their suitability to the land cover classification problem. The sensitivity to the data quality will also be studied. Compared to existing work, this work is innovative in the reference map construction (both leveraging existing maps and fit for end-user purpose) and the diversity of algorithms to produce our land cover map comparison.

How to cite: Rieutord, T., Bessardon, G., and Gleeson, E.: Physiography improvements in numerical weather prediction digital twin engines, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15756, https://doi.org/10.5194/egusphere-egu23-15756, 2023.

EGU23-15892 | ECS | Posters on site | ITS1.13/AS5.2

Towards emulated Lagrangian particle dispersion model footprints for satellite observations 

Elena Fillola, Raul Santos-Rodriguez, and Matt Rigby

Lagrangian particle dispersion models (LPDMs) have been used extensively to calculate source-receptor relationships (“footprints”) for use in greenhouse gas (GHG) flux inversions. However, because a backward-running model simulation is required for each data point, LPDMs do not scale well to very large datasets, which makes them unsuitable for use in GHG inversions using high-resolution satellite instruments such as TROPOMI. In this work, we demonstrate how Machine Learning (ML) can be used to accelerate footprint production, by first presenting a proof-of-concept emulator for ground-based site observations, and then discussing work in progress to create an emulator suitable to satellite observations. In Fillola et al (2023), we presented a ML emulator for NAME, the Met Office’s LPDM, which outputs footprints for a small region around an observation point using purely meteorological variables as inputs. The footprint magnitude at each grid cell in the domain is modelled independently using gradient-boosted regression trees. The model is evaluated for seven sites, producing a footprint in 10ms, compared to around 10 minutes for the 3D simulator, and achieving R2 values between 0.6 and 0.8 for CH4 concentrations simulated at the sites when compared to the timeseries generated by NAME. Following on from this work, we demonstrate how this same emulator can be applied to satellite data to reproduce footprints immediately around any measurement point in the domain, evaluating this application with data for Brazil and North Africa and obtaining R2 values of around 0.5 for simulated CH4 concentrations. Furthermore, we propose new emulator architectures for LPDMs applied to satellite observations. These new architectures should tackle some of the weaknesses in the existing approach, for example, by propagating information more flexibly in space and time, potentially improving accuracy of the derived footprints and extending the prediction capabilities to bigger domains.

How to cite: Fillola, E., Santos-Rodriguez, R., and Rigby, M.: Towards emulated Lagrangian particle dispersion model footprints for satellite observations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15892, https://doi.org/10.5194/egusphere-egu23-15892, 2023.

EGU23-15994 | ECS | Posters on site | ITS1.13/AS5.2

Uncertainty quantification in variational data assimilation with deep learning 

Nicolas Lafon, Philippe Naveau, and Ronan Fablet

The spatio-temporal reconstruction of a dynamical process from some observationaldata is at the core of a wide range of applications in geosciences. This is particularly true for weather forecasting, operational oceanography and climate studies. However, the re35 construction of a given dynamic and the prediction of future states must take into ac36 count the uncertainties that affect the system. Thus, the available observational measurements are only provided with a limited accuracy. Besides, the encoded physical equa38 tions that model the evolution of the system do not capture the full complexity of the real system. Finally, the numerical approximation generates a non-negligible error. For these reasons, it seems relevant to calculate a probability distribution of the state system rather than the most probable state. Using recent advances in machine learning techniques for inverse problems, we propose an algorithm that jointly learns a parametric distribution of the state, the dynamics governing the evolution of the parameters, and a solver. Experiments conducted on synthetic reference datasets, as well as on datasets describing environmental systems, validate our approach.

How to cite: Lafon, N., Naveau, P., and Fablet, R.: Uncertainty quantification in variational data assimilation with deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15994, https://doi.org/10.5194/egusphere-egu23-15994, 2023.

EGU23-16287 | ECS | Posters on site | ITS1.13/AS5.2

A machine learning emulator for forest carbon stocks and fluxes 

Carolina Natel de Moura, David Martin Belda, Peter Antoni, and Almut Arneth

Forests are a significant carbon sink of the total carbon dioxide (CO2) emitted by humans. Climate change is expected to impact forest systems, and their role in the terrestrial carbon cycle in several ways – for example, the fertilization effect of increased atmospheric CO2, and the lengthening of the growing season in northern temperate and boreal areas may increase forest productivity, while more frequent extreme climate events such as storms and windthrows or drought spells, as well as wildfires might reduce disturbances return period, hence increasing forest land loss and reduction of the carbon stored in the vegetation and soils. In addition, forest management in response to an increased demand for wood products and fuel can affect the carbon storage in ecosystems and wood products. State-of-the-art Dynamic Global Vegetation Models (DGVMs) simulate the forest responses to environmental and human processes, however running these models globally for many climate and management scenarios becomes challenging due to computational restraints. Integration of process-based models and machine learning methods through emulation allows us to speed up computationally expensive simulations. In this work, we explore the use of machine learning to surrogate the LPJ-GUESS DGVM. This emulator is spatially-aware to represent forests across the globe in a flexible spatial resolution, and consider past climate and forest management practices to account for legacy effects. The training data for the emulator is derived from dedicated runs of the DGVM sampled across four dimensions relevant to forest carbon and yield: atmospheric CO2 concentration, air Temperature, Precipitation, and forest Management (CTPM). The emulator can capture relevant forest responses to climate and management in a lightweight form, and will support the development of the coupled socio-economic/ecologic model of the land system, namely LandSyMM (landsymm.earth). Other relevant scientific applications include the analysis of optimal forestry protocols under climate change, and the forest potential in climate change mitigation.

 

How to cite: Natel de Moura, C., Belda, D. M., Antoni, P., and Arneth, A.: A machine learning emulator for forest carbon stocks and fluxes, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16287, https://doi.org/10.5194/egusphere-egu23-16287, 2023.

EGU23-16597 | Posters on site | ITS1.13/AS5.2 | Highlight

Global Decadal Sea Surface Height Forecast with Conformal Prediction 

Nils Lehmann, Jonathan Bamber, and Xiaoxiang Zhu

One of the many ways in which anthropogenic climate change impacts our planet is
rising sea levels. The rate of sea level rise (SLR) across the oceans is,
however, not uniform in space or time and is influenced by a complex interplay
of ocean dynamics, heat uptake, and surface forcing. As a consequence,
short-term (years to a decade) regional SLR patterns are difficult to model
using conventional deterministic approaches. For example, the latest climate
model projections (called CMIP6) show some agreement in the globally integrated
rate of SLR but poor agreement when it comes to spatially-resolved
patterns. However, such forecasts are valuable for adaptation planning in
coastal areas and for protecting low lying assets.
Rather than a deterministic modeling approach, here we explore the possibility
of exploiting the high quality satellite altimeter derived record of sea surface
height variations, which cover the global oceans outside of ice-infested waters
over a period of 30 years. Alongside this rich and unique satellite record,
several data-driven models have shown tremendous potential for various
applications in Earth System science. We explore several data-driven deep
learning approaches for sea surface height forecasts over multi-annual to
decadal time frames. A limitation of some machine learning approaches is the
lack of any kind of uncertainty quantification, which is problematic for
applications where actionable evidence is sought. As a consequence, we equip
our models with a rigorous measure of uncertainty, namely conformal prediction which
is a model and dataset agnostic method that provides calibrated predictive
uncertainty with proven coverage guarantees. Based on a 30-year satellite
altimetry record and auxiliary climate forcing data from reanalysis such as
ERA5, we demonstrate that our methodology is a viable and attractive alternative
for decadal sea surface height forecasts.

How to cite: Lehmann, N., Bamber, J., and Zhu, X.: Global Decadal Sea Surface Height Forecast with Conformal Prediction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16597, https://doi.org/10.5194/egusphere-egu23-16597, 2023.

EGU23-16936 | ECS | Orals | ITS1.13/AS5.2

Analysis of marine heat waves using machine learning 

Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, and Ronan Fablet

Sea surface temperature (SST) is a critical parameter in the global climate system and plays a vital role in many marine processes, including ocean circulation, evaporation, and the exchange of heat and moisture between the ocean and atmosphere. As such, understanding the variability of SST is important for a range of applications, including weather and climate prediction, ocean circulation modeling, and marine resource management.

The dynamics of SST is the compound of multiple degrees of freedom that interact across a continuum of Spatio-temporal scales. A first-order approximation of such a system was initially introduced by Hasselmann. In his pioneering work, Hasselmann (1976) discussed the interest in using a two-scale stochastic model to represent the interactions between slow and fast variables of the global ocean, climate, and atmosphere system. In this paper, we examine the potential of machine learning techniques to derive relevant dynamical models of Sea Surface Temperature Anomaly (SSTA) data in the Mediterranean Sea. We focus on the seasonal modulation of the SSTA and aim to understand the factors that influence the temporal variability of SSTA extremes. Our analysis shows that the variability of the SSTA can indeed well be decomposed into slow and fast components. The dynamics of the slow variables are associated with the seasonal cycle, while the dynamics of the fast variables are linked to the SSTA response to rapid underlying processes such as the local wind variability. Based on these observations, we approximate the probability density function of the SSTA data using a stochastic differential equation parameterized by a neural network. In this model, the drift function represents the seasonal cycle and the diffusion function represents the envelope of the fast SSTA response.

 

How to cite: Ouala, S., Chapron, B., Collard, F., Gaultier, L., and Fablet, R.: Analysis of marine heat waves using machine learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16936, https://doi.org/10.5194/egusphere-egu23-16936, 2023.

EGU23-338 | ECS | Posters virtual | ITS1.1/NH0.1

A Stacking Ensemble Deep Learning Approach for Post Disaster Building Assessment using UAV Imagery 

Leon Sim, Fang-Jung Tsai, and Szu-Yun Lin

Traditional post-disaster building damage assessments were performed manually by the response team, which was risky and time-consuming. With advanced remote sensing technology, such as an unmanned aerial vehicle (UAV), it would be possible to acquire high-quality aerial videos and operate at a variety of altitudes and angles.  The collected data would be sent into a neural network for training and validating. In this study, the Object Detection model (YOLO) was utilized, which is capable of predicting both bounding boxes and damage levels. The network was trained using the ISBDA dataset, which was created from aerial videos of the aftermath of Hurricane Harvey in 2017, Hurricane Michael and Hurricane Florence in 2018, and three tornadoes in 2017, 2018, and 2019 in the United States. The Joint Damage Scale was used to classify the buildings in this dataset into four categories: no damage, minor damage, major damage, and destroyed. However, the number of major damage and destroyed classes are significantly lower than the number of no damage and minor damage classes in the dataset. Also, the damage characteristics of minor and major damage classes are similar under such type of disaster. These caused the YOLO model prone to misclassify the intermediate damage levels, i.e., minor and major damage in our earlier experiments. This study aimed to improve the YOLO model using a stacking ensemble deep learning approach with a image classification model called Mobilenet. First, the ISBDA dataset was used and refined to train the YOLO network and the Mobilenet network separately, and the latter provides two classes predictions (0 for no damage or minor damage, 1 for major damage or destroyed) rather than the four classes by the former. In the inference phase, the initial predictions from the trained YOLO network, including bounding box coordinates, confidence scores for four damage classes, and the predicted class, were then extracted and passed to the trained Mobilenet to generate the secondary predictions for each building. Based on the secondary predictions, two hyperparameters were utilized to refine the initial predictions by modifying the confidence scores of each class, and the hyperparameters were trained during this phase. Lastly, the trained hyperparameters were applied to the testing dataset to evaluate the performance of the proposed method. The results show that our stacking ensemble method could obtain more reliable predictions of intermediate classes.

 

How to cite: Sim, L., Tsai, F.-J., and Lin, S.-Y.: A Stacking Ensemble Deep Learning Approach for Post Disaster Building Assessment using UAV Imagery, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-338, https://doi.org/10.5194/egusphere-egu23-338, 2023.

        Since Taiwan is located at the Pacific Ring of Fire, seismic activity of varying magnitudes occurs almost every day. Among them, some of these seismic activities have in turn caused severe disasters, resulting in loss of personal property, casualties and damage to important public facilities. Therefore, investigating the long-term spatiotemporal pattern of seismic activities is a crucial task for understanding the causes of seismic activity and to predict future seismic activity, in order to carry out disaster prevention measures in advance. Previous studies mostly focused on the causes of single seismic events on the small spatiotemporal scale. In this study, the data from 1987 to 2020 are used, including seismic events from the United States Geological Survey (USGS), the ambient environmental factors such as daily air temperature from Taiwan Central Weather Bureau (CWB) and daily sea surface temperature data from National Oceanic and Atmospheric Administration (NOAA). Then the temperature difference between the land air temperature and the sea surface temperature (SST) to the correlation between the occurrence of seismic activities and the abnormal occurrence of temperature difference are compared. The results show that lots of seismic activities often have positive and negative anomalies of temperature difference from 21 days before to 7 days after the seismic event. Moreover, there is a specific trend of temperature difference anomalies under different magnitude intervals. In the magnitude range of 2.5 to 4 and greater than 6, almost all of the seismic events have significant anomalous differences in the temperature difference between land air temperature and SST compared with no seismic events. This study uncovers anomalous frequency signatures of seismic activities and temperature differences between land air temperature and SST. The significant difference in temperature difference between seismic events and non-seismic events was compared by using statistical analysis. Additionally, the deep neural network (DNN) of deep learning model, logistic regression and random forest of machine learning model was used to identify whether there will be a seismic event under different magnitude intervals. It is hoped that it can provide relevant information for the prediction of future seismic activity, to more accurately prevent disasters that may be caused by seismic activity.

How to cite: Chen, Y.-H. and Lin, Y.-C.: Investigating the Correlation between the Characteristics of Seismic Activity and Environmental Variables in Taiwan, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2564, https://doi.org/10.5194/egusphere-egu23-2564, 2023.

The 2010-2011 Canterbury Earthquake sequence (CES) led to unprecedented building damage in the Canterbury region, New Zealand. Commercial and residential buildings were significantly affected. Due to New Zealand’s unique insurance setting, around 80% of the losses were covered by insurance (Bevere & Balz, 2012; King et al., 2014). The Insurance Council of New Zealand (ICNZ) estimated the total economic losses to be more than NZ$40 billion, with the Earthquake Commission (EQC) and private insurers covering NZ$10 billion and NZ$21 billion of the losses, respectively (ICNZ, 2021). As a result of the CES and the 2016 Kaikoura earthquake, EQC’s Natural Disaster Fund was depleted (EQC, 2022). This highlighted the need for improved tools enabling damage and loss analysis for natural hazards.
This research project used residential building claims collected by EQC following the CES to develop a rapid seismic loss prediction model for residential buildings in Christchurch. Geographic information systems (GIS) tools, data science techniques, and machine learning (ML) were used for the model development. Before the training of the ML model, the claims data was enriched with additional information from external data sources. The seismic demand, building characteristics, soil conditions, and information about the liquefaction occurrence were added to the claims data. Once merged and pre-processed, the aggregated data was used to train ML models based on the main events in the CES. Emphasis was put on the interpretability and explainability of the model. The ML model delivered valuable insights related to the most important features contributing to losses. Those insights are aligned with engineering knowledge and observations from previous studies, confirming the potential of using ML for disaster loss prediction and management. Care was also put into the retrainability of the model to ensure that any new data from future earthquake events can rapidly be added to the model. 

How to cite: Roeslin, S.: Development of a Rapid Seismic Loss Prediction Model for Residential Buildings using Machine Learning - Christchurch, New Zealand, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2996, https://doi.org/10.5194/egusphere-egu23-2996, 2023.

EGU23-3928 | Orals | ITS1.1/NH0.1

Comparison of deep learning approaches to monitor trash screen blockage from CCTV cameras 

Remy Vandaele, Sarah L Dance, and Varun Ojha

We investigate the use of CCTV cameras and deep learning to automatically monitor trash screen blockage. 

Trash screens are installed to prevent debris from entering critical parts of river networks (pipes, tunnels, locks,...). When the debris piles up at the trash screens,  it  may block the waterway and can cause flooding. It is thus crucial to clean blocked trash screens and avoid flooding and consequent damage. Currently, the maintenance crews must manually check a camera or river level data or go on site to check the state of the screen to know if it needs cleaning. This wastes valuable time in emergency situations where blocked screens must be urgently cleaned (e.g., in case of forecast  heavy rainfall). Some initial attempts at trying to predict trash screen blockage exist. However, these have not been widely adopted in practice.  CCTV cameras can be easily installed at any location and can thus be used to monitor the state of trash screens, but the images need to be processed by an automated algorithm to inform whether the screen is blocked.

With the help of UK-based practitioners (Environment Agency and local councils), we have created a dataset of 40000 CCTV trash screen images coming from 36 cameras, each labelled with blockage information. Using this database, we have compared 3 deep learning approaches to automate the detection of trash screen blockage: 

  • A binary image classifier, which takes as input a single image, and outputs a binary label that estimates whether the trash screen is blocked.
  • An approach based on anomaly detection which tries to reconstruct the input image with an auto-encoder trained on clean trash screen images.  In consequence, blocked trash screens are detected as anomalies by the auto-encoder.
  • An image similarity estimation approach based on the use of a siamese network, which takes as input two images and outputs a similarity index related, in our case, to whether both images contain trash. 

Using performance criteria chosen in discussion  with practitioners (overall accuracy, false alarm rate, resilience to luminosity / moving fields of view, computing capabilities), we show that deep learning can be used in practice to automate the identification of blocked trash screens. We also analyse the strengths and weaknesses of each of these approaches and provide guidelines for their application.

How to cite: Vandaele, R., Dance, S. L., and Ojha, V.: Comparison of deep learning approaches to monitor trash screen blockage from CCTV cameras, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3928, https://doi.org/10.5194/egusphere-egu23-3928, 2023.

EGU23-4455 | ECS | Posters virtual | ITS1.1/NH0.1

Traffic Monitoring System Design considering Multi-Hazard Disaster Risks 

Michele Gazzea, Reza Arghandeh, and Amir Miraki

Roadways are critical infrastructure in our society, providing services for people through and between cities. However, they are prone to closures and disruptions, especially after extreme weather events like hurricanes.

At the same time, traffic flow data are a fundamental type of information for any transportation system.

We tackle the problem of traffic sensor placement on roadways to address two tasks at the same time. The first task is traffic data estimation in ordinary situations, which is vital for traffic monitoring and city planning. We design a graph-based method to estimate traffic flow on roads where sensors are not present. The second one is enhanced observability of roadways in case of extreme weather events. We propose a satellite-based multi-domain risk assessment to locate roads at high risk of closures. Vegetation and flood hazards are taken into account. We formalize the problem as a search method over the network to suggest the minimum number and location of traffic sensors to place while maximizing the traffic estimation capabilities and observability of the risky areas of a city.

How to cite: Gazzea, M., Arghandeh, R., and Miraki, A.: Traffic Monitoring System Design considering Multi-Hazard Disaster Risks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4455, https://doi.org/10.5194/egusphere-egu23-4455, 2023.

Earthquake-induced land deformation and structure failure are more severe over soft soils than over firm soils and rocks owing to the seismic site effect and liquefaction. The site-specific seismic site effect related to the amplification of ground motion, liquefaction, and landslide has spatial uncertainty depending on the local subsurface, surface geological, and topographic conditions. When the 2017 Pohang earthquake (M 5.4), South Korea’s second strongest earthquake in decades, occurred, the severe damages influenced by variable site response and vulnerability indicators were observed focusing on the basin or basin-edge region deposited unconsolidated Quaternary sediments. Thus, nationwide site characterization is essential considering empirical correlations with geotechnical site response and hazard parameters and surface proxies. Furthermore, in case of so many variables and tenuously related correlations, machine learning classification models can prove to be very precise than the parametric methods. This study established a multivariate seismic site classification system using the machine learning technique based on the geospatial big data platform.

The supervised machine learning classification techniques and more specifically, random forest, support vector machine (SVM), and artificial neural network (ANN) algorithms have been adopted. Supervised machine learning algorithms analyze a set of labeled training data consisting of a group of input data and desired output values. They produce an inferred function that can be used for predictions from given input data. To optimize the classification criteria by considering the geotechnical uncertainty and local site effects, the training datasets applying principal component analysis (PCA) were verified with k-fold cross-validation. Moreover, the optimized training algorithm, proved by loss estimators (receiver operating characteristic curve (ROC), the area under the ROC curve (AUC)) based on confusion matrix, was selected.

For the southeastern region in South Korea, the boring log information (strata, standard penetration test, etc.), geological map (1:50k scale), digital terrain model (having 5 m × 5 m), soil map (1:250k scale) were collected and constructed as geospatial big data. Preliminarily, to build spatially coincided datasets with geotechnical response parameters and surface proxies, the mesh-type geospatial information was built by advanced geostatistical interpolation and simulation methods.

Site classification systems use seismic hazard parameters related to the geotechnical characteristics of the study area as the classification criteria. The current site classification systems in South Korea and the United States recommend Vs30, which is the average shear wave velocity (Vs) up to 30 m underground. This criterion uses only the dynamic characteristics of the site without considering its geometric distribution characteristics. Thus, the geospatial information included the geo-layer thickness, surface proxies (elevation, slope, geological category, soil category), and Vs30. For the liquefaction and landslide hazard estimation, the liquefaction vulnerability indexes (i.e., liquefaction potential or severity index) and landslide vulnerability indexes (i.e., a factor of safety or displacement) were also trained as input features into the classifier modeling. Finally, the composite status against seismic site effect, liquefaction, and landslide was predicted as hazard class (I.e., safe, slight-, moderate-, extreme-failure) based on the best-fitting classifier.  

How to cite: Kim, H.: Machine Learning-based Site Classification System for Earthquake-Induced Multi-Hazard in South Korea, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4757, https://doi.org/10.5194/egusphere-egu23-4757, 2023.

EGU23-4816 | ECS | Posters on site | ITS1.1/NH0.1

XAIDA4Detection: A Toolbox for the Detection and Characterization of Spatio-Temporal Extreme Events 

Jordi Cortés-Andrés, Maria Gonzalez-Calabuig, Mengxue Zhang, Tristan Williams, Miguel-Ángel Fernández-Torres, Oscar J. Pellicer-Valero, and Gustau Camps-Valls

The automatic anticipation and detection of extreme events constitute a major challenge in the current context of climate change, which has changed their likelihood and intensity. One of the main objectives within the EXtreme Events: Artificial Intelligence for Detection and Attribution (XAIDA) project (https://xaida.eu/) is related to developing novel approaches for the detection and localization of extreme events, such as tropical cyclones and severe convective storms, heat waves and droughts, as well as persistent winter extremes, among others. Here we introduce the XAIDA4Detection toolbox that allows for tackling generic problems of detection and characterization. The open-source toolbox integrates a set of advanced ML models, ranging in complexity, assumptions, and sophistication, and yields spatio-temporal explicit detection maps with probabilistic heatmap estimates. We included supervised and unsupervised methods, deterministic and probabilistic, neural networks based on convolutional and recurrent nets, and density-based methods. The toolbox is intended for scientists, engineers, and students with basic knowledge of extreme events, outlier detection techniques, and Deep Learning (DL), as well as Python programming with basic packages (Numpy, Scikit-learn, Matplotlib) and DL packages (PyTorch, PyTorch Lightning). This presentation will summarize the available features and their potential to be adapted to multiple extreme event problems and use cases.

How to cite: Cortés-Andrés, J., Gonzalez-Calabuig, M., Zhang, M., Williams, T., Fernández-Torres, M.-Á., Pellicer-Valero, O. J., and Camps-Valls, G.: XAIDA4Detection: A Toolbox for the Detection and Characterization of Spatio-Temporal Extreme Events, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4816, https://doi.org/10.5194/egusphere-egu23-4816, 2023.

EGU23-5581 | Posters on site | ITS1.1/NH0.1

Vision Transformers for building damage assessment after natural disasters 

Adrien Lagrange, Nicolas Dublé, François De Vieilleville, Aurore Dupuis, Stéphane May, and Aymeric Walker-Deemin

Damage assessment is a critical step in crisis management. It must be fast and accurate in order to organize and scale the emergency response in a manner adapted to the real needs on the ground. The speed requirements motivate an automation of the analysis, at least in support of the photo-interpretation. Deep Learning (DL) seems to be the most suitable methodology for this problem: on one hand for the speed in obtaining the answer, and on the other hand by the high performance of the results obtained by these methods in the extraction of information from images. Following previous studies to evaluate the potential contribution of DL methods for building damage assessment after a disaster, several conventional Deep Neural Network (DNN) and Transformers (TF) architectures were compared.

Made available at the end of 2019, the xView2 database appears to be the most interesting database for this study. It gathers images of disasters between 2011 and 2018 with 6 types of disasters: earthquakes, tsunamis, floods, volcanic eruptions, fires and hurricanes. For each of these disasters, pre- and post-disaster images are available with a ground truth containing the building footprint as well as the evaluation of the type of damage divided into 4 classes (no damage, minor damage, major damage, destroyed) similar to those considered in the study.

This study compares a wide range DNN architectures all based on an encoder-decoder structure. Two encoder families were implemented: EfficientNet (B0 to B7 configurations) and Swin TF (Tiny, Small, and Base configurations). Three adaptable decoders were implemented: UNet, DeepLabV3+, FPN. Finally, to benefit from both pre- and post-disaster images, the trained models were designed to proceed images with a Siamese approach: both images are processed independently by the encoder, and the extracted features are then concatenated by the decoder.

Taking benefit of global information (such as the type of disaster for example) present in the image, the Swin TF, associated with FPN decoder, reaches the better performances than all other encoder-decoder architectures. The Shifted WINdows process enables the pipe to process large images in a reasonable time, comparable to the processing time of EfficientNet-based architectures. An interesting additional result is that the models trained during this study do not seem to benefit so much from extra-large configurations, and both small and tiny configurations reach the highest scores.

How to cite: Lagrange, A., Dublé, N., De Vieilleville, F., Dupuis, A., May, S., and Walker-Deemin, A.: Vision Transformers for building damage assessment after natural disasters, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5581, https://doi.org/10.5194/egusphere-egu23-5581, 2023.

Natural and man-made disasters pose a threat to human life, flora-fauna, and infrastructure. It is critical to detect the damage quickly and accurately for infrastructures right after the occurrence of any disaster. The detection and assessment of infrastructure damage help manage financial strategy as well. Recently, many researchers and agencies have made efforts to create high-resolution satellite imageries database related to pre and post-disaster events. The advanced remote sensing satellite imageries can reflect the surface of the earth accurately up to 30 cm spatial resolution on a daily basis. These high spatial resolutions (HSR) imageries can help access any natural hazard's damage by comparing the pre- and post-disaster data. These remote sensing imageries have limitations, such as cloud occlusions. Building under a thick cloud cannot be recognised in optical images. The manual assessment of the severity of damage to buildings/infrastructure by comparing bi-temporal HSR imageries or airborne will be a tedious and subjective job. On the other hand, the emerging use of unmanned aired vehicles (UAV) can be used to assess the situation precisely. The high-resolution UAV imageries and the HSR satellite imageries can complement each other for critical infrastructure damage assessment. In this study, a novel approach is used to integrate UAV data into HSR satellite imageries for the building damage assessment using a convolution neural network (CNN) based deep learning model. The research work is divided into two fundamental sub-tasks: first is the building localisation in the pre-event images, and second is the damage classification by assigning a unique damage level label reflecting the degree of damage to each building instance on the post-disaster images. For the study, the HSR satellite imageries of 36 pairs of pre- and post natural hazard events is acquired for the year 2021-22, similarly available UAV based data for these events is also collected from the open data source. The data is then pre-processed, and the building damage is assessed using a deep object-based semantic change detection framework (ChangeOS). The mentioned model was trained on the xview2 building damage assessment datasets comprised of ~20,000 images with ~730,000 building polygons of pre and post disaster events over the globe from 2011-2018. The experimental setup in this study includes training on the global dataset and testing on the regional-scale building damage assessment using HSR satellite imageries and local-scale using UAV imageries. The result obtained from the bi-temporal assessment of HSR images for the Indonesia Earthquake 2022 has shown an F1 score of ~67%, while the Uttarakhand flooding event 2021 has reported an F1 score of ~64%. The HSR imageries from the UAV Haiti earthquake event in 2011 have also shown less but promising F1 scores of ~54%. It is inferred that merging HSR imageries from satellite and UAV for building damage assessment using the ChangeOS framework represents a robust tool to further promote future research in infrastructure maintenance strategy and policy management in disaster response.

How to cite: Gupta, S. and Nair, S.: A novel approach for infrastructural disaster damage assessment using high spatial resolution satellite and UAV imageries using deep learning algorithms., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5778, https://doi.org/10.5194/egusphere-egu23-5778, 2023.

EGU23-5913 | ECS | Orals | ITS1.1/NH0.1

Pluto: A global volcanic activity early warning system powered by large scale self-supervised deep learning on InSAR data 

Nikolaos Ioannis Bountos, Dimitrios Michail, Themistocles Herekakis, Angeliki Thanasou, and Ioannis Papoutsis

Artificial intelligence (AI) methods have emerged as a powerful tool to study and in some cases forecast natural disasters [1,2]. Recent works have successfully combined deep learning modeling with scientific knowledge stemming from the SAR Interferometry domain propelling research on tasks like volcanic activity monitoring [3], associated with ground deformation. A milestone in this interdisciplinary field has been the release of the Hephaestus [4] InSAR dataset, facilitating automatic InSAR interpretation, volcanic activity localization as well as the detection and categorization of atmospheric contributions in wrapped interferograms. Hephaestus contains annotations for approximately 20,000 InSAR frames, covering the 44 most active volcanoes in the world. The annotation was performed  by a team of InSAR experts that manually examined each InSAR frame individually. However, even with such a large dataset, class imbalance remains a challenge, i.e. the InSAR samples containing volcano deformation fringes are orders of magnitude less than those that do not. This is anticipated since natural hazards are in principle rare in nature. To counter that, the authors of Hephaestus provide more than 100,000 unlabeled InSAR frames to be used for global large-scale self-supervised learning, which is more robust to class imbalance when compared to supervised learning [5]. 

Motivated by the Hephaestus dataset and the insights provided by [2], we train global, task-agnostic models in a self-supervised learning fashion that can handle distribution shifts caused by spatio-temporal variability as well as major class imbalances. By finetuning such a model to the labeled part of Hephaestus we obtain the backbone for a global volcanic activity alerting system, namely Pluto. Pluto is a novel end-to-end AI based system that provides early warnings of volcanic unrest on a global scale.

Pluto automatically synchronizes its database with the Comet-LiCS [6] portal to receive newly generated Sentinel-1 InSAR data acquired over volcanic areas. The new samples are fed to our volcanic activity detection model. If volcanic activity is detected, an automatic email is sent to the service users, which contains information about the intensity, the exact location and the type (Mogi, Sill, Dyk) of the event. To ensure a robust and ever-improving service we augment Pluto with an iterative pipeline that collects samples that were misclassified in production, and uses them to further improve the existing model. 

 

[1] Kondylatos et al. "Wildfire danger prediction and understanding with Deep Learning." Geophysical Research Letters 49.17 (2022): e2022GL099368.

[2] Bountos et al. "Self-supervised contrastive learning for volcanic unrest detection." IEEE Geoscience and Remote Sensing Letters 19 (2021): 1-5.

[3] Bountos et al. "Learning from Synthetic InSAR with Vision Transformers: The case of volcanic unrest detection." IEEE Transactions on Geoscience and Remote Sensing (2022).

[4] Bountos et al. "Hephaestus: A large scale multitask dataset towards InSAR understanding." Proceedings of the IEEE/CVF CVPR. 2022.

[5] Liu et al. "Self-supervised learning is more robust to dataset imbalance." arXiv preprint arXiv:2110.05025 (2021).

[6] Lazecký et al. "LiCSAR: An automatic InSAR tool for measuring and monitoring tectonic and volcanic activity." Remote Sensing 12.15 (2020): 2430.

How to cite: Bountos, N. I., Michail, D., Herekakis, T., Thanasou, A., and Papoutsis, I.: Pluto: A global volcanic activity early warning system powered by large scale self-supervised deep learning on InSAR data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5913, https://doi.org/10.5194/egusphere-egu23-5913, 2023.

It has become increasingly apparent over the past few decades that environmental degradation is something of a common concern for humanity and it is difficult to deny that the present environmental problems are caused primarily by anthropogenic activities rather than natural causes.

To minimize disaster’s risk, the role of geospatial science and technology may be a terribly helpful and necessary technique for hazard zone mapping throughout emergency conditions. 

This approach can definitively help predict harmful events, but also to mitigate damage to the environment from events that cannot be efficiently predicted.

With detailed information obtained through various dataset, decision making has become simpler. This fact is crucial for a quick and effective response to any disaster. Remote sensing, in particular RADAR/SAR data, help in managing a disaster at various stages. 

Prevention for example refers to the outright avoidance of adverse impacts of hazards and related disasters; preparedness refers to the knowledge and capacities to effectively anticipate, respond to, and recover from, the impacts of likely, imminent or current hazard events or conditions.

Finally relief is the provision of emergency services after a disaster in order to reduce damage to environment and people.

Thanks to the opportunity proposed by ASI (Italian Space Agency) to use COSMO-SkyMed data, in NeMeA Sistemi srl we developed two projects: “Ventimiglia Legalità”, “Edilizia Spontanea” and 3xA.

Their main objective is to detect illegal buildings not present in the land Legal registry.

We developed new and innovative technologies using integrated data for the monitoring and protection of environmental and anthropogenic health, in coastal and nearby areas. 

3xA project addresses the highly challenging problem of automatically detecting changes from a time series of high-resolution synthetic aperture radar (SAR) images. In this context, to fully leverage the potential of such data, an innovative machine learning based approach has been developed. 

The project is characterized by an end-to-end training and inference system which takes as input two raw images and produces a vectorized change map without any human supervision.

More into the details, it takes as input two SAR acquisitions at time t1 and t2, the acquisitions are firstly pre-processed, homogenised and finally undergo a completely self-supervised algorithm which takes advantage of DNNs to classify changed/unchanged areas. This method shows promising results in automatically producing a change map from two input SAR images (Stripmap or Spotlight COSMO-SkyMed data), with 98% accuracy.

Being the process automated, results are produced faster than similar products generated by human operators.

A similar approach has been followed to create an algorithm which performs semantic segmentation from the same kind of data.

This time, only one of the two SAR acquisitions is taken as input for pre-processing steps and then for a supervised neural network. The result is a single image where each pixel is labelled with the class predicted by the algorithm. 

Also in this case, results are promising, reaching around 90% of accuracy. 

How to cite: Pennino, I.: A new approach for hazard and disaster prevention: deep learning algorithms for change detection and classification RADAR/SAR, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6522, https://doi.org/10.5194/egusphere-egu23-6522, 2023.

EGU23-6790 | ECS | Posters on site | ITS1.1/NH0.1

Deep learning for automatic flood mapping from high resolution SAR images 

Arnaud Dupeyrat, abdullah Almaksour, Joao Vinholi, and tapio friberg

 With the gradual warming of the global climate, natural catastrophes have caused billions of dollars in damage to ecosystems, economies and properties. Along with the damage, the loss of life is a very serious possibility. With the unprecedented growth of the human population, large-scale development activities and changes to the natural environment, the frequency, and intensity of extreme natural events and consequent impacts are expected to increase in the future. 

 To be able to mitigate and to reduce the potential damage of the natural catastrophe, continuous monitoring is required. The collection of data using earth observation (EO) systems has been valuable for tracking the effects of natural hazards, especially with their near real-time capabilities for tracking extreme natural events. Remote sensing systems from different platforms also serve as an important decision support tool for devising response strategies, coordinating rescue operations, and making damage and loss estimations.

 Synthetic aperture radar (SAR) imagery provides highly valuable information about our planet that no other technology is capable of. SAR sensors emit their own energy to illuminate objects or areas on Earth and record what’s reflected back from the surface to the sensor. This allows data acquisition day and night since no sunlight is needed. SAR also uses longer wavelengths than optical systems, which gives it the unsurpassed advantage of being able to penetrate clouds, rain, fog and smoke. All of this makes SAR imagery unprecedentedly valuable in sudden events and crisis situations requiring a rapid response.

 In this talk we will be focusing on flood monitoring using our ICEYE SAR images, taking into account multi-satellites, multi-angles and multi-resolutions that are inherent from our constellation and capabilities. We will present the different steps necessary that have allowed us to improve the consistency of our generated flood maps.

How to cite: Dupeyrat, A., Almaksour, A., Vinholi, J., and friberg, T.: Deep learning for automatic flood mapping from high resolution SAR images, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6790, https://doi.org/10.5194/egusphere-egu23-6790, 2023.

Increasing climatic extremes resulted in frequency and severity of urban flood events during the last several decades. Significant economic losses were point out the urgency of flood response. In recent years, the government gradually increased the layout of CCTV water level monitoring facilities for the purpose of decision-making in flood event. However, it is difficult for decision makers to recognize multiple images in the same time. Therefore, the aim of this study attempts to establish an automatic water level recognition method for given closed-circuit television (CCTV) system.

In the last years, many advances have been made in the area of automatic image recognition with methods of artificial intelligence. Little literature has been published on real-time water level recognition of closed-circuit television system for disaster management. The purpose of this study is to examine the possibilities in practice of artificial intelligence for real-time water level recognition with deep convolutional neural network. Proposed methodology will demonstrate with several case studies in Taichung. For the potential issue that AI models may lacks of learning target, the generative adversarial network (GAN) may be adopted for this study. The result of this study could be useful to decision makers responsible for organizing response assignments during flood event.

How to cite: Chen, B. and Li, C.-Y.: A study on the establishment of computer vision for disaster identification based on existing closed-circuit television system, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7435, https://doi.org/10.5194/egusphere-egu23-7435, 2023.

EGU23-8419 | ECS | Orals | ITS1.1/NH0.1

Synthetic Generation of Extra-Tropical Cyclones’ fields with Generative Adversarial Networks 

Filippo Dainelli, Riccardo Taormina, Guido Ascenso, Enrico Scoccimarro, Matteo Giuliani, and Andrea Castelletti

Extra-Tropical Cyclones are major systems ruling and influencing the atmospheric structure at mid-latitudes. They are characterised by strong winds and heavy precipitation, and can cause considerable storm surges potentially devastating for coastal regions. The availability of historical observations of the extreme events caused by intense ETCs are rather limited, hampering risk evaluation. Increasing the amount of significant data available would substantially help several fields of analysis influenced by these events, such as coastal management, agricultural production, energy distribution, air and maritime transportation, and risk assessment and management.

Here, we address the possibility of generating synthetic ETC atmospheric fields of mean sea level pressure, wind speed, and precipitation in the North Atlantic by training a Generative Adversarial Network (GAN). The purpose of GANs is to learn the distribution of a training set based on a game theoretic scenario where two networks compete against each other, the generator and the discriminator. The former is trained to generate synthetic examples that are plausible and resemble the real ones. The input of the generator is a vector of random Gaussian values, whose domain is known as the “latent space”. The discriminator learns to distinguish whether an example comes from the dataset distribution. The competition set by the game-theoretic approach improves the network until the counterfeits are indistinguishable from the originals.

To train the GAN, we use atmospheric fields extracted from the ERA5 reanalysis dataset in the geographic domain with boundaries 0°- 90°N, 70°W - 20°E and for the period 1st January 1979 - 1st January 2020. We analyse the generated samples’ histograms, the samples’ average fields, the Wasserstein distance and the Kullback-Leibler divergence between the generated samples and the test set distributions. Results show that the generative model has learned the distribution of the values of the atmospheric fields and the general spatial trends of the atmosphere in the domain. To evaluate better the atmospheric structure learned by the network, we perform linear and spherical interpolations in the latent space. Specifically, we consider four cyclones and compare the frames of their tracks to those of the synthetic tracks generated by interpolation. The interpolated tracks show interesting features consistent with the original tracks. These findings suggest that GANs can learn meaningful representations of the ETCs’ fields, encouraging further investigations to model the tracks’ temporal evolution.

How to cite: Dainelli, F., Taormina, R., Ascenso, G., Scoccimarro, E., Giuliani, M., and Castelletti, A.: Synthetic Generation of Extra-Tropical Cyclones’ fields with Generative Adversarial Networks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8419, https://doi.org/10.5194/egusphere-egu23-8419, 2023.

EGU23-8944 | ECS | Orals | ITS1.1/NH0.1

Towards probabilistic impact-based drought risk analysis – a case study on the Volta Basin 

Marthe Wens, Raed Hamed, Hans de Moel, Marco Massabo, and Anna Mapelli

Understanding the relationships between different drought drivers and observed drought impact can provide important information for early warning systems and drought management planning. Moreover, this relationship can help inform the definition and delineation of drought events. However, currently, drought hazards are often characterized based on their frequency of occurring, rather than based on the impacts they cause. A more data-driven depiction of “impactful drought events”- whereby droughts are defined by the hydrometeorological conditions that, in the past, have led to observable impacts-, has the potential to be more meaningful for drought risk assessments.

In our research, we apply a data-mining method based on association rules, namely fast and frugal decision trees, to link different drought hazard indices to agricultural impacts. This machine learning technique is able to select the most relevant drought hazard drivers (among both hydrological and meteorological indices) and their thresholds associated with “impactful drought events”. The technique can be used to assess the likelihood of occurrence of several impact severities, hence it supports the creation of a loss exceedance curve and estimates of average annual loss. An additional advantage is that such data-driven relations in essence reflect varying local drought vulnerabilities which are difficult to quantify in data-scarce regions.

This contribution exemplifies the use of fast and frugal decision trees to estimate (agricultural) drought risk in the Volta basin and its riparian countries. We find that some agriculture-dependent regions in Ghana, Togo and Côte d’Ivoire face annual average drought-induced maize production losses up to 3M USD, while per hectare, losses can mount to on average 50 USD/ha per year in Burkina Faso. In general, there is a clear north-south gradient in the drought risk, which we find augmented under projected climate conditions. Climate change is estimated to worsen the drought impacts in the Volta Basin, with 11 regions facing increases in annual average losses of more than 50%.

We show that the proposed multi-variate, impact-based, non-parametric, machine learning approach can improve the evaluation of droughts, as this approach directly leverages observed drought impact information to demarcate impactful drought events. We evidence that the proposed technique can support quantitative drought risk assessments which can be used for geographic comparison of disaster losses at a sub-national scale.

How to cite: Wens, M., Hamed, R., de Moel, H., Massabo, M., and Mapelli, A.: Towards probabilistic impact-based drought risk analysis – a case study on the Volta Basin, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8944, https://doi.org/10.5194/egusphere-egu23-8944, 2023.

EGU23-9091 | Orals | ITS1.1/NH0.1

Improving near real-time flood extraction pipeline from SAR data using deep learning 

Mathieu Turgeon-Pelchat, Heather McGrath, Fatemeh Esfahani, Simon Tolszczuk-Leclerc, Thomas Rainville, Nicolas Svacina, Lingjun Zhou, Zarrin Langari, and Hospice Houngbo

The Canada Centre for Mapping and Earth Observation (CCMEO) uses Radarsat Constellation Mission (RCM) data for near-real time flood mapping. One of the many advantages of using SAR sensors, is that they are less affected by the cloud coverage and atmospheric conditions, compared to optical sensors. RCM has been used operationally since 2020 and employs 3 satellites, enabling lower revisit times and increased imagery coverage. The team responsible for the production of flood maps in the context of emergency response are able to produce maps within four hours from the data acquisition. Although the results from their automated system are good, there are some limitations to it, requiring manual intervention to correct the data before publication. Main limitations are located in urban and vegetated areas. Work started in 2021 to make use of deep learning algorithms, namely convolutional neural networks (CNN), to improve the performances of the automated production of flood inundation maps. The training dataset make use of the former maps created by the emergency response team and is comprised of over 80 SAR images and corresponding digital elevation model (DEM) in multiple locations in Canada. The training and test images were split in smaller tiles of 256 x 256 pixels, for a total of 22,469 training tiles and 6,821 test tiles. Current implementation uses a U-Net architecture from NRCan geo-deep-learning pipeline (https://github.com/NRCan/geo-deep-learning). To measure performance of the model, intersection over union (IoU) metric is used. The model can achieve 83% IoU for extracting water and flood from background areas over the test tiles. Next steps include increasing the number of different geographical contexts in the training set, towards the integration of the model into production.

How to cite: Turgeon-Pelchat, M., McGrath, H., Esfahani, F., Tolszczuk-Leclerc, S., Rainville, T., Svacina, N., Zhou, L., Langari, Z., and Houngbo, H.: Improving near real-time flood extraction pipeline from SAR data using deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9091, https://doi.org/10.5194/egusphere-egu23-9091, 2023.

EGU23-9426 | ECS | Orals | ITS1.1/NH0.1

Fire hazard modelling with remote sensing data for South America 

Johanna Strebl, Julia Gottfriedsen, Dominik Laux, Max Helleis, and Volker Tresp

Throughout the past couple years, changes in global climate have been turning wildfires into an increasingly unpredictable phenomenon. Many environmental parameters that have been linked to wildfires, such as the number of consecutive hot days, are becoming increasingly unstable. This leads to a twofold problem: adequate fire risk assessment is at the same time more important and more difficult than ever. 

In the past, physical models were the prevalent approach to most questions in the domain of wildfire science. While they tend to provide accurate and transparent results, they require domain expertise and often tedious manual data collection.

In recent years, increased computation capabilities and the improved availability of remote sensing data associated with the new space movement have made deep learning a beneficial approach. Data-driven approaches often yield state of the art performance without requiring expert knowledge at a fraction of the complexity of physical models. The downside, however, is that they are often intransparent and offer no insights into their inner algorithmic workings. 

We want to shed some light on this interpretability/performance tradeoff and compare different approaches for predicting wildfire hazard. We evaluate their strengths and weaknesses with a special focus on explainability. We built a wildfire hazard model for South America based on a spatiotemporal CNN architecture that infers fire susceptibility from environmental conditions that led to fire in the past. The training data used contains selected ECMWF ERA5 Land variables and ESA world cover information. This means that our model is able to learn from actual fire conditions instead of relying on theoretical frameworks. Unlike many other models, we do not make simplifying assumptions such as a standard fuel type, but calculate hazard ratings based on actual environmental conditions. Compared to classical fire hazard models, this approach allows us to account for regional and atypical fire behavior and makes our model readily adaptable and trainable for other ecosystems, too.

The ground truth labels are derived from fusing active fire remote sensing data from 20 different satellites into one active wildfire cluster data set. The problem itself is highly imbalanced with non-fire pixels making up 99.78% of the training data. Therefore we evaluate the ability of our model to correctly predict wildfire hazard using metrics for imbalanced data such as PR-AUC and F1 score. We also compare the results against selected standard fire hazard models such as the Canadian Fire Weather Index (FWI). 

In addition, we assess the computational complexity and speed of calculating the respective models and consider the accuracy/complexity/speed tradeoff of the different approaches. Furthermore, we aim to provide insights why and how our model makes its predictions by leveraging common explainability methods. This allows for insights into which factors tend to influence wildfire hazard the most and to optimize for relatively lightweight, yet performant and transparent architectures.

How to cite: Strebl, J., Gottfriedsen, J., Laux, D., Helleis, M., and Tresp, V.: Fire hazard modelling with remote sensing data for South America, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9426, https://doi.org/10.5194/egusphere-egu23-9426, 2023.

For recent years, Machine Learning (ML) models have been proven to be useful in solving problems of a wide variety of fields such as medical, economic, manufacturing, transportation, energy, education, etc. With increased interest in ML models and advances in sensor technologies, ML models are being widely applied even in civil engineering domain. ML model enables analysis of large amounts of data, automation, improved decision making and provides more accurate prediction. While several state-of-the-art reviews have been conducted in each sub-domain (e.g., geotechnical engineering, structural engineering) of civil engineering or its specific application problems (e.g., structural damage detection, water quality evaluation), little effort has been devoted to comprehensive review on ML models applied in civil engineering and compare them across sub-domains. A systematic, but domain-specific literature review framework should be employed to effectively classify and compare the models. To that end, this study proposes a novel review approach based on the hierarchical classification tree “D-A-M-I-E (Domain-Application problem-ML models-Input data-Example case)”. “D-A-M-I-E” classification tree classifies the ML studies in civil engineering based on the (1) domain of the civil engineering, (2) application problem, (3) applied ML models and (4) data used in the problem. Moreover, data used for the ML models in each application examples are examined based on the specific characteristic of the domain and the application problem. For comprehensive review, five different domains (structural engineering, geotechnical engineering, water engineering, transportation engineering and energy engineering) are considered and the ML application problem is divided into five different problems (prediction, classification, detection, generation, optimization). Based on the “D-A-M-I-E” classification tree, about 300 ML studies in civil engineering are reviewed. For each domain, analysis and comparison on following questions has been conducted: (1) which problems are mainly solved based on ML models, (2) which ML models are mainly applied in each domain and problem, (3) how advanced the ML models are and (4) what kind of data are used and what processing of data is performed for application of ML models. This paper assessed the expansion and applicability of the proposed methodology to other areas (e.g., Earth system modeling, climate science). Furthermore, based on the identification of research gaps of ML models in each domain, this paper provides future direction of ML in civil engineering based on the approaches of dealing data (e.g., collection, handling, storage, and transmission) and hopes to help application of ML models in other fields.

How to cite: Kim, J. and Jung, D.: State-of-the-Art Review of Machine Learning Models in Civil Engineering: Based on DAMIE Classification Tree, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11636, https://doi.org/10.5194/egusphere-egu23-11636, 2023.

EGU23-11756 * | Orals | ITS1.1/NH0.1 | Highlight

Digital twin computing for enhancing resilience of disaster response system 

Shunichi Koshimura and Erick Mas

Digital twin is now recognized as digital copies of physical world's objects stored in digital space and utilized to simulate the sequences and consequences of target phenomena. By incorporating physical world’s data into the digital twin, developers and users have a full view of the target through real-time feedback. Recent advances in high-performance computing and large-scale data fusion of sensing and observations of both natural and social phenomena are enhancing applicability of digital twin paradigm to natural disaster research. Artificial intelligence (AI) and machine learning are also being applied more and more widely across the world and contributing as essential elements of digital twin. Those have significant implications for disaster response and recovery to hold out the promise of dramatically improving our understanding of disaster-affected areas and responses in real-time.

A project is underway to enhance resilience of disaster response systems by constructing "Disaster Digital Twin" to support disaster response team in the anticipated tsunami disaster. “Disaster Digital Twin” platform consists of a fusion of real-time hazard simulation, e.g. tsunami inundation forecast, social sensing to identify dynamic exposed population, and multi-agent simulation of disaster response activities to find optimal allocation or strategy of response efforts, and achieve the enhancement of disaster resilience.

To achieve the goal of innovating digital twin computing for enhancing disaster resilience, four preliminary results are shown;

(1) Developing nation-wide real-time tsunami inundation and damage forecast system. The priority target for forecasting is the Pacific coast of Japan, a region where Nankai trough earthquake is likely to occur.

(2) Establishing a real-time estimation of the number of exposed population in the inundation zone and clarifying the relationship between the exposed population and medical demand.

(3) Developing a reinforcement learning-based multi-agent simulation of medical activities in the affected areas with use of damage information, medical demands, and resources in the medical facilities to fid optimal allocation of medical response.

(4) Developing a digital twin computing platform to support disaster medical response activities and find optimal allocation of disaster medical services through what-if analysis of multi-agent simulation.

How to cite: Koshimura, S. and Mas, E.: Digital twin computing for enhancing resilience of disaster response system, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11756, https://doi.org/10.5194/egusphere-egu23-11756, 2023.

EGU23-12240 | ECS | Posters on site | ITS1.1/NH0.1

Classification Seismic Spectrograms from Deep Neural Network: Application to Alarm System of Post-failure Landslides 

Jui-Ming Chang, Wei-An Chao, and Wei-Kai Huang

Daman Landslide had blocked one of the three cross-island roads in Taiwan, and a road section has been under control since last October. During the period, more than thousands of small-scale post-failures occurred whose irregular patterns affected the safety of engineering workers for slope protection construction and road users. Therefore, we installed one time-lapse camera and two geophones at the crown and closed to the toe of the Daman landslide, respectively to train a classification model to offer in-situ alarm. According to time-lapse photos, those post failures can be categorized into two types. One is rock/debris moving and stopping above the upper slope or road, named type I, and the other is the rock/debris going through the road to download slope, named type II. Type I was almost recorded by the crown station, and type II was shown by both stations with different arrival times and the toe station’ high-frequency signals gradually rising (up to 100 Hz). Those distinct features were exhibited by spectrograms. To keep characteristics simultaneously, we merge two stations’ spectrograms as one to indicate different types of post-failures. However, frequent earthquakes affect the performance of the landslide’s discrimination, which should be involved in the classification model. A total of three labels, type I, type II, and earthquake, contained more than 15,000 images of spectrogram, have been used for deep neural network (DNN) to be a two-station-based automatic classifier. Further, user-defined parameters for the specific frequency band within fixed time span windows, including a sum of power spectrogram density, the arrival time of peak amplitude, cross-correlation coefficient, and signal-to-noise ratio, have been utilized for the decision tree algorithm. Both model results benefit the automatic classifier for post-failure alarms and can readily extend to monitor other landslides with frequent post-failures by transfer learning.

How to cite: Chang, J.-M., Chao, W.-A., and Huang, W.-K.: Classification Seismic Spectrograms from Deep Neural Network: Application to Alarm System of Post-failure Landslides, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12240, https://doi.org/10.5194/egusphere-egu23-12240, 2023.

EGU23-12716 | ECS | Posters on site | ITS1.1/NH0.1

Investigating causal effects of anthropogenic factors on global fire modeling 

Nirlipta Pande and Wouter Dorigo

Humans significantly control the natural environment and natural processes. Global fire ignitions are a prime example of how human actions change the frequency of occurrence of otherwise rare events like wildfires. However, human controls on fire ignition are insufficiently characterised by global fire models because impacts are often indirect, complex, and collinear. Hence, modelling fire activity while considering the complex relationships amongst the input variables and their effect on global ignitions is crucial to developing fire models reflecting the real world. 

This presentation leverages causal inference and machine learning frameworks applied to global datasets of fire ignitions from Earth observations and potential drivers to uncover anthropogenic pathways on fire ignition. Potential fire controls include human predictors from Earth observations and statistical data combined with variables traditionally associated with fire activity, like weather, and vegetation abundance and state, derived from earth observations and models.

Our research models causal relationships between fire control variables and global ignitions using Directed Acyclic Graphs(DAGs). Here, every edge between variables symbolises a relation between them; the edge weight indicates the strength of the relationship, and the orientation of the edge between the variables signifies the cause-and-effect relationship between the variables. However, defining a fire ignition distribution using DAGs is challenging owing to the large combinatorial sample space and acyclicity constraint. We use Bayesian structure learning to make these approximations and infer the extent of human intervention when combined with climate variables and vegetation properties. Our research demonstrates the need for causal modelling and the inclusion of anthropogenic factors in global fire modelling.

How to cite: Pande, N. and Dorigo, W.: Investigating causal effects of anthropogenic factors on global fire modeling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12716, https://doi.org/10.5194/egusphere-egu23-12716, 2023.

EGU23-13083 | Orals | ITS1.1/NH0.1

Machine learning modelling of compound flood events 

Agnieszka Indiana Olbert, Sogol Moradian, and Galal Uddin

Flood early warning systems are vital for preventing flood damages and for reducing disaster risks. Such systems are particularly important for forecasting compound events where multiple, often dependent flood drivers co-occur and interact. In this research an early warning system for prediction of coastal-fluvial floods is developed to provide a robust, cost-effective and time-efficient framework for management of flood risks and impacts. This three-step method combines a cascade of three linked models: (1) statistical model that determines probabilities of multiple-driver flood events, (2) hydrodynamic model forced by outputs from the statistical model, and finally (3) machine learning (ML) model that uses hydrodynamic outputs from various probability flood events to train the ML algorithm in order to predict the spatially and temporarily variable inundation patterns resulting from a combination of coastal and fluvial flood drivers occurring simultaneously.

The method has been utilized for the case of Cork City, located in the south-west of Ireland, which has a long history of fluvial-coastal flooding. The Lee  River channelling through the city centre may generate a substantial flood when the downstream river flow draining to the estuary coincides with the sea water propagating upstream on a flood tide. For this hydrological domain the statistical model employs the univariate extreme values analysis and copula functions to calculate joint probabilities of river discharges and sea water levels (astronomical tides and surge residuals) occurring simultaneously. The return levels for these two components along a return level curve produced by the copula function are used to generate synthetic timeseries, which serve as water level boundary conditions for a hydrodynamic flood model. The multi-scale nested flood model (MSN_Flood) was configured for Cork City at 2m resolution to simulate an unsteady, non-uniform flow in the Lee  River and a flood wave propagation over urban floodplains. The ensemble hydrodynamic model outputs are ultimately used to train and test a range machine learning models for prediction of flood extents and water depths. In total, 23 machine learning algorithms including: Artificial Neural Network, Decision Tree, Gaussian Process Regression, Linear Regression, Radial Basis Function, Support Vector Machine, and Support Vector Regression were employed to confirm that the ML algorithm can be used successfully to predict the flood inundation depths over urban floodplains for a given set of compound flood drivers. Here, the limited flood conditioning factors taken into account to analyse floods are the upstream flood hydrographs and downstream sea water level timeseries. To evaluate model performance, different statistical skill scores were computed. Results indicated that in most pixels, the Gaussian Process Regression model performs better than the other models.

The main contribution of this research is to demonstrate the ML models can be used in early warning systems for flood prediction and to give insight into the most suitable models in terms of robustness, accuracy, effectiveness, and speed. The findings demonstrate that ML models do help in flood water propagation mapping and assessment of flood risk under various compound flood scenarios.

How to cite: Olbert, A. I., Moradian, S., and Uddin, G.: Machine learning modelling of compound flood events, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13083, https://doi.org/10.5194/egusphere-egu23-13083, 2023.

EGU23-14126 | ECS | Orals | ITS1.1/NH0.1

ML-based fire spread model and data pipeline optimization 

Tobias Bauer, Julia Miller, Julia Gottfriedsen, Christian Mollière, Juan Durillo Barrionuevo, and Nicolay Hammer

Climate change is one of the most pressing challenges to humankind today. The number and severity of wildfires are increasing in many parts of the world, with record-breaking temperatures, prolonged heat waves, and droughts. We can minimize the risks and consequences of these natural disasters by providing accurate and timely wildfire progression predictions through fire spread modeling. Knowing the direction and rate of spread of wildfires over the next hours can help deploy firefighting resources more efficiently and warn nearby populations hours in advance to allow safe evacuation.
Physics-based spread models have proven their applicability on a regional scale but often require detailed spatial input data. Additionally, rendering them in real-time scenarios can be slow and therefore inhibit fast output generation. Deep learning-based models have shown success in specific fire spread scenarios in recent years. But they are limited by their transferability to other regions, explainability, and longer training time. Accurate active fire data products and a fast data pipeline are additional essential requirements of a wildfire spread early-warning system.
In this study, physical models are compared to a deep learning-based CNN approach in terms of computational speed, area accuracy, and spread direction. We use a dataset of the 30 largest wildfires in the US in the year 2021 to evaluate the performance of the model’s predictions.
This work focuses in particular on the optimization of a cloud-based fire spread modeling data pipeline for near-real-time fire progression over the next  2 to 24 hours. We describe our data pipeline, including the collection and pre-processing of ignition points derived from remote sensing-based active fire detections. Furthermore, we use data from SRTM-1 as topography, ESA Land Cover and Corine Land Cover for fuel composition, and ERA-5 Reanalysis products for weather data inputs. The application of the physics-based models is derived from the open-source library ForeFire, to create and execute physical wildfire spread models from single fire ignition points as well as fire fronts. The predictions of the ForeFire model serve as a benchmark for the evaluation of the performance of our Convolutional Neural Network. The CNN forecasts the fire outline based on a spatiotemporal U-Net architecture. 
The scaling of the algorithms to a global setting is enabled by the Leibniz Supercomputing Centre. It enables large-scale cloud-based machine learning to provide a time-sensitive solution for operational fire spread modeling in emergency management based on real-time remote sensing information. 

How to cite: Bauer, T., Miller, J., Gottfriedsen, J., Mollière, C., Durillo Barrionuevo, J., and Hammer, N.: ML-based fire spread model and data pipeline optimization, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14126, https://doi.org/10.5194/egusphere-egu23-14126, 2023.

EGU23-15711 | Orals | ITS1.1/NH0.1

A globally distributed dataset using generalized DL for rapid landslide mapping on HR satellite imagery 

Filippo Catani, Sansar Raj Meena, Lorenzo Nava, Kushanav Bhuyan, Silvia Puliero, Lucas Pedrosa Soares, Helen Cristina Dias, and Mario Floris

Multiple landslide events occur often across the world which have the potential to cause significant harm to both human life and property. Although a substantial amount of research has been conducted to address the mapping of landslides using Earth Observation (EO) data, several gaps and uncertainties remain when developing models to be operational at the global scale. To address this issue, we present the HR-GLDD, a high-resolution (HR) dataset for landslide mapping composed of landslide instances from ten different physiographical regions globally: South and South-East Asia, East Asia, South America, and Central America. The dataset contains five rainfall triggered and five earthquake-triggered multiple landslide events that occurred in varying geomorphological and topographical regions. HR-GLDD is one of the first datasets for landslide detection generated by high-resolution satellite imagery which can be useful for applications in artificial intelligence for landslide segmentation and detection studies. Five state-of-the-art deep learning models were used to test the transferability and robustness of the HR-GLDD. Moreover, two recent landslide events were used for testing the performance and usability of the dataset to comment on the detection of newly occurring significant landslide events. The deep learning models showed similar results for testing the HR-GLDD in individual test sites thereby indicating the robustness of the dataset for such purposes. The HR-GLDD can be accessed open access and it has the potential to calibrate and develop models to produce reliable inventories using high-resolution satellite imagery after the occurrence of new significant landslide events. The HR-GLDD will be updated regularly by integrating data from new landslide events.

How to cite: Catani, F., Meena, S. R., Nava, L., Bhuyan, K., Puliero, S., Pedrosa Soares, L., Dias, H. C., and Floris, M.: A globally distributed dataset using generalized DL for rapid landslide mapping on HR satellite imagery, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15711, https://doi.org/10.5194/egusphere-egu23-15711, 2023.

EGU23-16626 | ECS | Posters on site | ITS1.1/NH0.1

Danish national early warning system for flash floods based on a gradient boosting machine learning framework 

Grith Martinsen, Yann Sweeney, Jonas Wied Pedersen, Roxana Alexandru, Sergi Capape, Charlotte Harris, Michael Butts, and Maria Diaz

Fluvial and flash floods can have devastating effects if they occur without warning. In Denmark, management of flood risk and performing preventative emergency service actions has been the sole responsibility of local municipalities. However, motivated by the disastrous 2021 floods in Central Europe, the Danish government has recently appointed the Danish Meteorological Institute (DMI) as the national authority for flood warnings in Denmark, and DMI is in the process of building capacity to fulfill this role.

 

One of the most cost-effective ways to mitigate flood damages is a well-functioning early warning system. Flood warning systems can rely on various methods ranging from human interpretation of meteorological and hydrological data to advanced hydrological modelling. The aim of this study is to generate short-range streamflow predictions in Danish river systems with lead times of 4-12 hours. To do so, we train and test models with hourly data on 172 catchments.

 

Machine learning (ML) models have in many cases been shown to outperform traditional hydrological models and offer efficient ways to learn patterns in historical data. Here, we investigate streamflow predictions with LightGBM, which is a gradient boosting framework that employs tree-based ML algorithms and is developed and maintained by Microsoft (Ke et al., 2017). The main argument for choosing a tree-based algorithm is its inherent ability to represent rapid dynamics often observed during flash floods. The main advantages of LightGBM over other tree-based algorithms are efficiency in training and lower memory consumption. We benchmark LightGBM’s performance against persistence, linear regression and various LSTM setups from the Neural Hydrology library (Kratzert et al., 2022).

 

We evaluate the algorithm trained using different input features. This analysis include model explainability, such as SHAP, and the results indicate that simply using lagged real-time observations of streamflow together with precipitation leads to the best performing and most parsimonious models. The results show that the LightGBM setup outperforms the benchmarks and is able to generate predictions with high Klinge-Gupta Efficiency scores > 0.9 in most catchments. Compared to the persistence benchmark it especially shows strong improvements on peak timing errors.

How to cite: Martinsen, G., Sweeney, Y., Pedersen, J. W., Alexandru, R., Capape, S., Harris, C., Butts, M., and Diaz, M.: Danish national early warning system for flash floods based on a gradient boosting machine learning framework, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16626, https://doi.org/10.5194/egusphere-egu23-16626, 2023.

The purpose of this abstract is to describe a coupled CFD-MPM model that combines soil mechanics (saturated sediments) with fluid mechanics (seawater or air) as well as solid mechanics (structures) to consider interactions between soil, fluid, and structures. With this formulation, the Material Point Method, which models large deformations in porous media and structures in conjunction with the Implicit Continuous-fluid Eulerian Method, which models complex fluid flows, is combined to model large deformations in porous media and structures. The model has been validated through various benchmarks and then it is used to simulate submarine landslides due to earthquakes. It is shown that this model captures the complicated interactions between saturated sediment, seawater, and offshore structures. This allows us to estimate the impact of potential submarine landslides on offshore structures using the model. 

How to cite: Tran, Q. A.: A hybrid MPM-CFD model for simulating earthquake-induced submarine landslides, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-112, https://doi.org/10.5194/egusphere-egu23-112, 2023.

EGU23-1131 | ECS | Orals | NH3.11

Landsifier: A python library to estimate likely triggers and types of landslides 

Ugur Ozturk, Kamal Rana, Kushanav Bhuyan, and Nishant Malik

The accuracy of landslide hazard models depends on landslide databases for model training and testing. Landslide databases frequently lack information on the underlying triggering mechanism (i.e., earthquake, rainfall), rendering them nearly useless in hazard models.

We created Landsifier, a Python-based unique library with three different machine-Learning frameworks for assessing the likely triggering mechanisms of individual landslides or entire inventories based on landslide 2D platforms and 3D shapes relying on an underlying digital elevation model (DEM). The base method extracts landslide planform properties as a feature space for the shallow learner-random forest (RF). An alternative approach uses 2D landslide images as input for the convolutional neural network deep learning algorithm (CNN). The final framework uses topological data analysis (TDA) to extract features from 3D landslide surfaces, which are then fed into the random forest classifier as a feature space. We tested the developed methods on six inventories spread over Japan. We achieved mean accuracy ranging from 70% to 98%.

Advancing this trigger classifier, we are working on the next generation to classify also the landslide types (i.e., flows, slides, falls, complex) similarly.

How to cite: Ozturk, U., Rana, K., Bhuyan, K., and Malik, N.: Landsifier: A python library to estimate likely triggers and types of landslides, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1131, https://doi.org/10.5194/egusphere-egu23-1131, 2023.

EGU23-1600 | ECS | Orals | NH3.11

Nonsmooth simulations of 3D Drucker-Prager granular flows and validation against experimental column collapses 

Gauthier Rousseau, Thibaut Métivet, Hugo Rousseau, Gilles Daviet, and Florence Bertails-Descoubes

Testing advanced numerical hydro-mechanical models against well-controlled experiments is a critical step in improving our understanding of unsteady granular mass flows, and necessary to provide some domains of validity for any further risk assessment.
To this end, experimental granular collapses were performed to evaluate the sand6 numerical simulator introduced by Daviet & Bertails-Descoubes (2016), which represents the granular medium as an inelastic and dilatable continuum subject to the Drucker-Prager yield criterion in the dense regime, and computes its dynamics using a 3D material point method (MPM). A specificity of this numerical model is to solve such the Drucker-Prager nonsmooth rheology without any regularisation, by leveraging tools from nonsmooth optimisation.
This nonsmooth simulator, which relies on a constant friction coefficient, is able to reproduce with high fidelity various experimental granular collapses over inclined erodible beds, provided the friction coefficient is set to the avalanche angle - and not to the stop angle, as generally done. The results, obtained for two different granular materials and for bed inclinations ranging from 0° to 20°, suggest that a simple constant friction rheology choice remains reasonable for capturing a large variety of granular collapses up to aspect ratios in the order of 10.
Investigating the precise role of the frictional walls by performing experimental and simulated collapses with various channel widths, we find out that, unlike some assumptions formerly made in the literature, the channel width has lower influence than expected on the granular flow and deposit.
The constant coefficient model is extended with a hysteresis model, thereby improving the predictions of the early-stage dynamics of the collapse. This illustrates the potential effects of such phenomenology on transient granular flows, paving the way to more elaborate analysis.

How to cite: Rousseau, G., Métivet, T., Rousseau, H., Daviet, G., and Bertails-Descoubes, F.: Nonsmooth simulations of 3D Drucker-Prager granular flows and validation against experimental column collapses, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1600, https://doi.org/10.5194/egusphere-egu23-1600, 2023.

Abstract: The occurrence time of investigated landslide hazard is not complete, leading to an error in the statistical relationship between rainfall and landslide. And the low accuracy of the critical rainfall threshold model will be built. And further, it will lead to an increase in the false positive rate of meteorological early warning. This study takes rainfall-induced landslides in the Wanzhou District of Chongqing from 1995 to 2015 as the research object. And Henghe Township, where historical disaster data is missing seriously, is the verification area. This study proposes a prediction model of the daily temporal probability of landslides occurrence on a certain day based on Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). The method is used to reconstruct the temporal information of rainfall-induced landslide events by simulating the nonlinear relationship between the occurrence time of landslides and rainfall. The landslide events after the reconstruction of temporal information were verified and selected, and then applied to the reasonable division of the E-D effective rainfall threshold curve, so as to establish the landslide meteorological warning model. The average temporal probability of rainfall-induced landslide occurrence on a certain day predicted by the proposed method reached 90.33%, which is higher than that of ANN (71.17%), LSTM (72.75%), and TCN (86.91%). Based on the temporal probability of landslide occurrence on a certain day which is higher than the 90% probability threshold, 18-time information including 42 landslides in Henghe Township of the verification area is expanded to 201. Compared with only using the historical landslide events, the meteorological warning model based on the expanded time information has a more reasonable warning classification, and the effective warning rate in the severe warning level is increased by 42.86%. The model method in this study is of constructive significance to the daily temporal probability prediction of rainfall-induced landslides on the regional scale and is helpful for the government to accurately model the risk decision of landslide meteorological warning.

How to cite: Zhao, Y. and Chen, L.: Rainfall-induced Landslide temporal probability prediction and meteorological early warning modeling based on LSTM_TCN model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1702, https://doi.org/10.5194/egusphere-egu23-1702, 2023.

The Jurassic red-strata of the Three Gorges Reservoir Area in China is interbedded of thick siltstone and thin sandy-mudstone and contains many clay minerals, such as montmorillonite and illite, which is water sensitive, weak and expansive, and easy to decompose by water weathering. In particular, due to the seasonal rainfall, development of settlements, and large-scale reservoir impoundment, many slow-moving landslides (e.g., deep rotation and planar landslides) often occur. Notwithstanding, the reconnaissance, updating, and mapping of kinematic features of township area landslides lack the appropriate attention of the government and researchers. Landslide susceptibility mapping is necessary prerequisites for landslide hazard and risk assessment. But a certain proportion of unpredictability is always closely related to modeling. The main objective of this work is to introduce deep ensemble learning into landslide susceptibility assessment to improve the performance of maximum likelihood models. Therefore, the current model construction has focused on three basic classifiers: decision tree, support vector machine, multi-layer perceptron neural network model, and two homogeneous ensemble models: random forest and extreme gradient boosting. Two prominent ensemble techniques—homogeneous/heterogeneous model ensemble and bagging, boosting, stacking ensemble strategy—were applied to implement the deep ensemble learning. Then, thirteen influencing factors were prepared as predictors and dependent variables. The landslide susceptibility maps were validated by the area under the receiver operating characteristic curve. The results of validation showed that the ensemble model shows that the ROC/AUC value is higher than 0.9, which is improved compared with the basic classifiers. Deep ensemble learning focuses more on detecting the landslide susceptibility area with the highest probability of occurrence. The Stacking based RF-XGBoost model obtained the best verification score (AUC=0.955). The comparison between the susceptibility map and landslide inventory data is encouraging as most of the recorded landslide pixels (about 83.3%) are at a high susceptibility level. Besides, from the information gain rate, we found that the Yangtze River and human engineering activities mainly affect the results, which is consistent with the current situation in the study area. The research results in the township-level landslide susceptibility map can also be extended to other urban and rural areas affected by landslides to reduce the landslide disaster risk and formulate further development strategies.

How to cite: Zeng, T., Yin, K., and Wu, L.: Uncertainty research of landslide susceptibility mapping based deep ensemble learning: different basic classifier and ensemble strategy, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2445, https://doi.org/10.5194/egusphere-egu23-2445, 2023.

EGU23-4428 | Orals | NH3.11

From Dense Flows to Powder Cloud Simulations: The OpenFOAM Avalanche Module 

Matthias Rauter, Julia Kowalski, and Wolfgang Fellin

OpenFOAM [1] is a well-known and widely used framework for physical simulations. Its Finite Area Framework allows the depth-integrated simulation of flows on nearly arbitrary surfaces. It was shown that this framework can be applied to snow avalanche simulations in natural terrain [2].

We will present the latest updates to the framework and the implementation of the avalanche module. The module provides not only a model for dense flow avalanches [2], but was lately extended to simulate powder snow avalanches and mixed snow avalanches. Various well-known friction and snow entrainment models are available for use, as well as unique models for deposition and coupling of dense flow and powder cloud layer in mixed snow avalanches. For practical applications, the module provides interfaces and methods for the integration of geographic information systems (GIS) and is fully capable of using raster and shape files for in- and output.

The avalanche module is built to integrate well in the OpenFOAM structure and follows the common user concepts of OpenFOAM. Therefore, users familiar with OpenFOAM should be able to accommodate quickly to the module and to run simulations after a short time. The module is provided as open source and its structure enables and encourages the implementation and experimenting with new ideas. One mayor goal of the module is to reduce the time from model development to model evaluation and application.

The module is hosted and developed collaboratively on develop.openfoam.com/Community/avalanche. We will provide an introduction into the framework and development process and provide interested people pointers on how to get started with the module and how to implement their own ideas.

[1] Weller, H. G., Tabor, G., Jasak, H., & Fureby, C. (1998). A tensorial approach to computational continuum mechanics using object-oriented techniques. Computers in physics, 12(6), 620-631.

[2] Rauter, M., Kofler, A., Huber, A., & Fellin, W. (2018). faSavageHutterFOAM 1.0: depth-integrated simulation of dense snow avalanches on natural terrain with OpenFOAM. Geoscientific Model Development, 11(7), 2923-2939.

How to cite: Rauter, M., Kowalski, J., and Fellin, W.: From Dense Flows to Powder Cloud Simulations: The OpenFOAM Avalanche Module, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4428, https://doi.org/10.5194/egusphere-egu23-4428, 2023.

EGU23-4715 | Posters on site | NH3.11

MultiResUNet, VGG16, and U-Net applications for landslide detection 

Saro Lee, Fatemeh Rezaie, and Mahdi Panahi

The frequent occurrence of disastrous landslides can lead to significant infrastructure damages, loss of life, and the relocation of populations. Early detection of landslides is crucial for mitigating the consequences. Today, deep learning algorithms, particularly fully convolutional networks (FCNs) and their variants such as the ResU-Net, have been utilized to rapidly and automatically detecting landslides. In the current study, a novel method using three new deep learning models: MultiResUNet, VGG16, and U-Net was used to detect landslides in Hokkaido Island, Japan. Our dataset is comprised of Sentinel-2 images and a mask layer, which includes "landslide" or "non-landslide" labels. The suggested framework was based on the analysis of satellite images of landslide-prone locations using bands 2 (blue), 3 (green), 4 (red), and 5 (visible and near-infrared) of Sentinel 2, slope and elevation factors. We trained each model on the dataset and evaluated their performance using a variety of statistical indexes, including precision, recall, and F1 score. The results showed that the MultiResUNet model outperformed the other two models, achieving an accuracy of 82.7%. The VGG16 and U-Net models achieved accuracies of 65.5% and 67.2%, respectively. The results indicated the capability of deep learning algorithms to process satellite images for early landslide detection and provide the opportunity of implementing efficient and effective disaster management strategies.

How to cite: Lee, S., Rezaie, F., and Panahi, M.: MultiResUNet, VGG16, and U-Net applications for landslide detection, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4715, https://doi.org/10.5194/egusphere-egu23-4715, 2023.

Gravity-driven geophysical granular flows, such as rock avalanches, landslides, debris flows, etc., interact with obstacles (e.g. bridge piers and buildings) as they flow down the slope, causing rapid changes in flow velocity and height in the vicinity to form a granular shock wave in front of the object. The interaction between shock waves will affect the granular-flow field near the obstacles. However, the complex physical processes make some challenges in understanding how the granular material behaves in the influencing area of shock-shock interaction.

In this study, systematic chute experiments were performed with glass particles to investigate the dynamic interaction between granular flow and two circular cylinders with variable spacing distances. The pressure sensors were used to measure the impact pressure of the granular flow on the upstream cylindrical surfaces and a plate equipped flush with the chute bed. The accelerometers were mounted at the bottom of the plate to record seismic signals generated by the granular flow impacting on the bed as well as the cylinders. Flow velocities and depths were determined using an image processing method. The discrete element method (DEM) was utilized to construct a virtual model of the chute system and particles and to simulate the dynamic processes of granular flow interacting with the cylinders. The experimental and the DEM simulated results showed that bow shock waves were generated just upstream of the two cylinders and a granular vacuum zone was formed on the lee side of each cylinder, with the incoming flow velocity being significantly reduced in the granular-shock influencing area. As the spacing decreases, the two shock waves change from being independent to mutual interference. In addition, the effects of spacing distances on the shapes of the granular vacuum and bow shock waves were investigated by experiments and compared to the DEM results, showing a strong interaction between granular shocks. The pinch-off distance which is determined by the length of the granular vacuum also showed a dependence on the spacing distance of the cylinders, indicating a decreasing pinch-off distance with decreasing value of spacing. The impact pressures and acoustic signals generated by granular flow impacting on the chute bed and the surfaces of the cylinders in the shock influencing area for varying Froude numbers were also analyzed.

In summary, the DEM simulations and the recorded signals are helpful to analyze the interaction between granular shock waves. The finding in present study may contribute to better understanding granular shock dynamics and may eventually in improving the design of the protective structure in hazard-prone area.

How to cite: Wang, J., Chen, Z., and Wang, D.: Effects of Spacing Distance between Cylindrical Obstacles on Granular Shock Interactions in Gravity-Driven Experimental Flows, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5177, https://doi.org/10.5194/egusphere-egu23-5177, 2023.

EGU23-5309 | ECS | Orals | NH3.11

Impacts of flow path water-saturation for debris-flow erosion modelling at Illgraben (Switzerland) 

Anna Lena Könz, Jacob Hirschberg, Brian McArdell, Perry Bartelt, and Peter Molnar

Debris flows can significantly grow along their flow path by entraining sediments stored in the channel bed and banks. This entrainment process is influenced by various factors such as flow properties (e.g., flow momentum, basal shear stress) and environmental conditions (e.g., soil water saturation, sediment availability). In recent years, different attempts to include the entrainment process in runout models have improved modelled flow properties and runout behavior by empirically linking entrainment volumes to individual modelled flow properties. Linking entrainment to environmental factors, however, has remained challenging.

Here, we aim at implementing and testing the influence of flow path water-saturated conditions in debris-flow runout modelling in a Swiss debris-flow basin (Illgraben). To this end, the modified RAMMS runout model, which includes an empirical algorithm to describe entrainment as a function of basal shear stress (Frank et al., 2015), is coupled with a simple hydrological model to predict soil water saturation. In a first step, the RAMMS model was calibrated for the Illgraben site for seven events with detailed data on erosion/deposition along the fan as well as flow properties at the outflow of the simulation domain (de Haas et al., 2022). In the calibration procedure, the focus was placed on the erosion proportionality factor dz/dtau [m/kPa] (which links the maximum potential erosion depth to the basal shear stress) as it is assumed to be the driving saturation-induced increase of entrained volume. Preliminary results show that in most cases, including the entrainment process improves the reproduction of the flow properties, especially the ‘hydrograph’ front, and that the erosion proportionality factor dz/dt shows a significant degree of variation for different events. In a second step, the relationship between soil moisture conditions and maximum erosion depth expected along the flow path was investigated. The hydrologic conditions are simulated with a conceptual model solving the water balance for the basin’s headwaters. The headwater discharge serves as the water input for the channel on the fan, where an infiltration model is applied, and entrainment is investigated. The presented framework, which could be incorporated into other runout models, is expected to be useful for debris-flow entrainment modelling, as well as for assessing climate change impacts on debris-flow runout.

References

de Haas, T., McArdell, B.W., Nijland, W., Åberg, A.S., Hirschberg, J., Huguenin, P., 2022. Flow and Bed Conditions Jointly Control Debris‐Flow Erosion and Bulking. Geophysical Research Letters 49. https://doi.org/10.1029/2021GL097611

Frank, F., McArdell, B.W., Huggel, C., Vieli, A., 2015. The importance of entrainment and bulking on debris flow runout modeling: examples from the Swiss Alps. Nat. Hazards Earth Syst. Sci. 15, 2569–2583. https://doi.org/10.5194/nhess-15-2569-2015

How to cite: Könz, A. L., Hirschberg, J., McArdell, B., Bartelt, P., and Molnar, P.: Impacts of flow path water-saturation for debris-flow erosion modelling at Illgraben (Switzerland), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5309, https://doi.org/10.5194/egusphere-egu23-5309, 2023.

EGU23-6411 | ECS | Posters virtual | NH3.11 | Highlight

Importance of water and water producing processes in cascading events in mountainous regions 

Jessica Munch and Perry Bartelt

Over the last years, several multiphase avalanches have been observed, some of them leading to a cascade of events, such as in Chamoli, India, 2021, where a mixture of ice and rock fell down Ronti Peak, and transitioned to a debris flow with large amounts of water being involved. Another example is the event that occurred at Pizzo Cengalo, Switzerland, in 2017, where the rock face collapsed on the underlying glacier, entraining part of it, and also transitioning to a debris flow. When such a mass movement occurs, and leads to a cascade of events, the runout distances are much longer, and the consequences, both for humans and infrastructure, are much more important. 

When a multiphase avalanche turns into a cascade of events, the amount of water present in the flow seems to be a determining factor for the runout distance. The sources of water, for both of the events aforementioned remain debated, and the amounts of water that can be generated by the melting of the ice in the flow or by entrainment are poorly constrained. Indeed, from the moment that ice and snow are involved in a multi-material gravitational flow, they have the potential to melt due to friction between the different components of the flow and with the ground, and hence generate water. Material entrainment on the way also has the potential to either directly incorporate water in the flow, or bring in material with a high water content (i.e. hydrated sediments) or ice, that has the ability to melt while the flow propagates. An accurate modelling the thermal aspect of the flow as well as its ability to entrain material on the way is necessary to quantify the amount of water present in the flow.

Here, using a multiphase depth-average model specifically designed to handle gravitational flows made of rocks/ice/water/snow or any single components of these, we want to assess 1) the impact of heat transfers between the materials and 2) entrainment of multiphase ground material on the flow behaviour and more specifically on the water content in the flow and the consequences it has in term of runout distances and potential for cascading events. 

First results show that both entrainment and heat transfer within the flow play a major role in water production. Our experiments suggest that heat transfer between rocks and ice leads to the most efficient water production. Material entrainment also plays a major role in incorporating water in the flow, or producing it by melting entrained ice. Better constrains regarding material thermal properties, ground composition and potential for entrainment are however necessary to accurately quantify the amounts of water that can join the flow and influence the runout distances.

How to cite: Munch, J. and Bartelt, P.: Importance of water and water producing processes in cascading events in mountainous regions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6411, https://doi.org/10.5194/egusphere-egu23-6411, 2023.

EGU23-6718 | ECS | Posters on site | NH3.11 | Highlight

Generating multi-temporal landslide inventories through a general deep transfer learning strategy using HR EO data 

Kushanav Bhuyan, Hakan Tanyas, Lorenzo Nava, Silvia Puliero, Sansar Raj Meena, Mario Floris, Cees van Westen, and Filippo Catani

Mapping landslides in space has gained a lot of attention over the past decade with good results. Current methods are primarily used to generate event inventories, but multi-temporal (MT) inventories are rare, even with manual landslide mapping. Here, we present an innovative deep learning strategy employing transfer learning. This allows our Attention Deep Supervision multi-scale U-Net model to be adapted to landslide detection tasks in new regions. This method also provides the flexibility to retrain a pretrained model to detect both rain and seismic landslides in new regions of interest. For mapping, archived Planet Lab remote sensing imagery from 2009 to 2021 at spatial resolutions of 3–5 m was used to systematically generate MT landslide inventories. Examining all cases, our approach provided an average F1 value of 0.8, indicating that it successfully identified the spatiotemporal occurrence of landslides. To examine the size distribution of mapped landslides, we compared the frequency distribution of predicted co-seismic landslides with manually mapped products from the literature. The results showed good agreement between the calculated exponents of the power law, with differences ranging from 0.04 to 0.21. Overall, this study demonstrated that the proposed algorithm can be applied to large areas to construct a polygon-based MT landslide inventory.

How to cite: Bhuyan, K., Tanyas, H., Nava, L., Puliero, S., Meena, S. R., Floris, M., Westen, C. V., and Catani, F.: Generating multi-temporal landslide inventories through a general deep transfer learning strategy using HR EO data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6718, https://doi.org/10.5194/egusphere-egu23-6718, 2023.

EGU23-6884 | ECS | Orals | NH3.11

Using Deep Learning for Sentinel-1-based Landslide Mapping 

Aiym Orynbaikyzy, Frauke Albrecht, Wei Yao, Simon Plank, Andres Camero, and Sandro Martinis

Every year, landslides kill or injure thousands of people worldwide and substantially impact human livelihood. With the increasing number of extreme weather events due to the changing climate, urban sprawl and intensification of human activities, the amount of deadly landslide events is expected to grow. Landslides often occur unexpectedly due to the difficulty of predicting their location and timing. In such cases, providing information on the spatial extent of the landslide hazard is essential for organising and executing first-response actions on the ground.

This study explores the advantages and limitations of using high-resolution Synthetic Aperture Radar (SAR) data from Sentinel-1 within a deep learning framework for rapidly mapping landslide events. The objectives of the research are four-fold: 1) to investigate how Sentinel-1 landslide mapping can be improved using deep learning; 2) to explore if the addition of up to three pre-event scenes could improve the SAR-based classification accuracies; 3) to test if and how much the addition of polarimetric decomposition features and interferometric coherence help to improve classification accuracies; 4) to test if performing data augmentation affects the final results.

We adopt a semantic segmentation model – U-Net, and a novel deep network - U2-Net, to map landslides based on limited but globally distributed landslide inventory data. In total, 306 image patches with 128x128 pixels size were split into 80% for training/validation of the model and 20% for testing it. We calculate radar backscatter information (gamma nought VV and VH), polarimetric decomposition features (alpha angle, entropy, anisotropy) and interferometric coherence between temporally adjacent scenes. The features are calculated for three pre-event scenes and one post-event scene. Copernicus Digital Elevation Model (DEM) data are used to integrate land surface elevation and slope information into the classification process.

Using all Sentinel-1 features, the best result of deep learning model obtained 0.96 for the Dice coefficient on validation data. The landslide detection based on U2-Net gave slightly better results than the U-Net-based approach. The accuracies of models based on one, two or three pre-event scenes did not substantially differ, indicating no added values of increasing pre-event SAR features. Higher accuracies were reached when polarimetric decomposition features were combined with interferometric coherence compared to runs with only radar backscatter. Increasing the sample size using image augmentation methods such as four-directional rotation and flipping helped advance the accuracy.

Future research is directed towards (i) increasing and diversifying the landslide examples, (ii) performing landslide-events-based resampling and (iii) adding pre- and post-event optical data from Sentinel-2.  

How to cite: Orynbaikyzy, A., Albrecht, F., Yao, W., Plank, S., Camero, A., and Martinis, S.: Using Deep Learning for Sentinel-1-based Landslide Mapping, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6884, https://doi.org/10.5194/egusphere-egu23-6884, 2023.

EGU23-8446 | ECS | Orals | NH3.11 | Highlight

Automatic detection of landslides from satellite images using a range of training events 

Kathryn Leeming, Itahisa Gonzalez Alvarez, Alessandro Novellino, and Sophie Taylor

Landslides in remote or uninhabited regions can be undocumented, leaving gaps in landslide inventories which are a key input for hazard and risk assessments. This can lead to landslide events being missing from research studies, and contribute to a bias in the events used for training of machine learning models.

In this work we use satellite images, terrain information, and labelled examples of landslides to train a convolutional neural network (U-Net), for the purpose of adding previously undocumented and new landslides to inventories. This model segments the input images and highlights the pixels it labels as landslides.

Our work focusses on landslides with a range of types and triggers, so that the model is exposed to a variety of training data. We describe the key properties of the landslides in the training set, and discuss the implications for future uses of the trained model.

How to cite: Leeming, K., Gonzalez Alvarez, I., Novellino, A., and Taylor, S.: Automatic detection of landslides from satellite images using a range of training events, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8446, https://doi.org/10.5194/egusphere-egu23-8446, 2023.

EGU23-8596 | ECS | Posters virtual | NH3.11 | Highlight

Evaluating effects of topographies on explicit hydromechanical solvers using procedural generation 

Saoirse Robin Goodwin

A key problem for landslide research is evaluating hydromechanical solvers on a suitable variety of terrain types. There currently exists a large gulf between studies using hydromechanical solvers on highly idealised terrain, and those on real topographies. This makes it difficult to properly evaluate (i) the sensitivity of the output from the solver to specific terrain features, and (ii) potential numerical artifacts. One way to bridge the gap is to use procedural generation -- which has been used extensively in the videogame and animation industries for three decades -- to generate hillsides with controlled properties. Indeed, the size and frequency of topographical features can be set using procedural generation algorithms, so the spatial distribution of topographical features can be varied in isolation. This study uses a depth-averaged SPH solver to model single-surge flows on a variety of procedurally generated terrains. We investigate the effects of the spatial distribution and magnitude of features on the deposition patterns from the flows. We also discuss other potential applications for these approaches, including hazard mapping for cases where topographical uncertainty is likely (e.g. for modelling snow avalanches).

How to cite: Goodwin, S. R.: Evaluating effects of topographies on explicit hydromechanical solvers using procedural generation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8596, https://doi.org/10.5194/egusphere-egu23-8596, 2023.

EGU23-8895 | ECS | Orals | NH3.11

Application of SOSlope to shallow landslide triggering in Rüdlingen (Switzerland) 

Ilenia Murgia, Filippo Giadrossich, Denis Cohen, Gian Franco Capra, and Massimiliano Schwarz

The development and application of deterministic models for vegetated slope stability analysis at a local scale is a pivotal issue in international research. Such tools identify mitigation and risk management techniques during increasingly frequent critical rainfall events. In this sense, the SOSlope software, developed by ecorisQ international association (www.ecorisq.org), allows the simulation of hydro-mechanical dynamics that may influence shallow landslides' occurrence, focusing on the progressive activation of root reinforcement in space and time to counteract soil movement. 

This study presents a reconstruction of an artificially triggered landslide in Rüdlingen (Switzerland), carried out during the Triggering Rapid Mass Movements project, aiming for a back-analysis of the hydro-mechanical conditions leading to its triggering. This experiment allows comparing real-scale data on triggering dynamics of shallow landslides with modeling assumptions and results. Detailed measurements during the investigation and following slope failure were used to calibrate the hydro-mechanical input parameters used in SOSlope and evaluate the modeling capability to reproduce the landslide-triggering conditions and behaviors. 

Results show a reasonable reconstruction of the complex dynamics leading to the loss of soil stability. In particular, considering the water effect and the force redistribution dynamics during the triggering. SOSlope can quantify the effect of the root reinforcement spatial distribution and passive earth pressure. In addition to quantifying the maximum value of root reinforcement achieved to counteract soil movement, SOSlope enables observing its progressive activation in space and time. Pore water pressure dynamics show a distinctive trend regarding preferential flows in soil fractures and macropores; the decrease of suction stress due to increased water content in the soil matrix was also observed. SOSlope allows for systemic analysis of the landslide event by evaluating the different phases of change in slope stability and identifying the causes that favored their failure. These results are challenging to understand the shallow landslide triggering dynamics on vegetated slopes, given simplified assumptions through simpler models. This tool could support risk management strategies, including green-based solutions, nearby structures and infrastructure, or reforestation activities for slope stabilization. In the latter case, through the software, the structure, composition, and efficiency of the plantation can be checked. 

Future developments in SOSlope will include the implementation of a triangulated grid mesh to improve computational limitations associated with the raster input data square grid resolution and the inclusion of new tree species for root reinforcement estimation.

How to cite: Murgia, I., Giadrossich, F., Cohen, D., Capra, G. F., and Schwarz, M.: Application of SOSlope to shallow landslide triggering in Rüdlingen (Switzerland), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8895, https://doi.org/10.5194/egusphere-egu23-8895, 2023.

EGU23-9956 | ECS | Orals | NH3.11

A surrogate model for depth-averaged erosion and deposition closures using deep learning 

Mohammad Nikooei and Clarence Edward Choi

Geophysical mass flows are commonly modelled using depth-averaged (DA) numerical models, which rely on closure relations to account for erosion and deposition. While erosion and deposition are grain scale phenomena, their physics is overlooked due to simplifications required in DA models. In this study, a framework is proposed to transfer the grain-scale physics of erosion and deposition to the continuum scale of DA models. A long short-term memory (LSTM) neural network is coupled with a DA model to incorporate the grain-scale physics of erosion and deposition. As a surrogate model for the closure relation, the LSTM model is trained using computed results from grain-scale Discrete Element Method (DEM) simulations. The surrogate model is evaluated by studying the deposition of an initially flowing granular mass over slope. The effective flow depth h and DA velocity u calculated by the DA-LSTM model are compared with DEM simulation results. The DA-LSTM model is demonstrated to provide more computational efficiency compared to DEM simulations. The newly proposed surrogate model offers a promising approach to calculating more complex closures using deep learning techniques.

How to cite: Nikooei, M. and Edward Choi, C.: A surrogate model for depth-averaged erosion and deposition closures using deep learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9956, https://doi.org/10.5194/egusphere-egu23-9956, 2023.

EGU23-10159 | ECS | Posters on site | NH3.11

Unravelling the complex dynamic of slow-moving landslides in the Flysch zone region, Lower Austria. A case study of the Hofermühle catchment. 

Yenny Alejandra Jiménez Donato, Edoardo Carraro, Philipp Marr, Robert Kanta, and Thomas Glade

Slow-moving landslides are complex processes that represent a significant challenge for landslide dynamic analysis and disaster risk reduction. In some cases, they have been considered as early signals of potential destructive events as they can accelerate under specific climatic conditions, causing significant damage.  However, slow-moving landslides have been constantly neglected as the require significant time, human resources, and specific numerical models to assess their non-uniformity. Considering the existing gaps and the lack of data of slow-moving landslides in Austria, a long-term monitoring project has been carried out by the ENGAGE group of the University of Vienna. Several investigation techniques for hydro-geo monitoring have been installed in Lower Austria for multi-temporal landslide investigation in several landslides, using them as living laboratories. Therefore, the present study aims to integrate the valuable hydro-mechanical data to bring light on potential acceleration conditions of slow-moving landslides, frequency and intensity relationships and cascading hazards initiated from within the slow-moving landslide mass.  

The geographical and geological conditions of the province of Lower Austria place it as a very susceptible region to the occurrence of landslides. The predominant geology correspond to the units of the Flysch Zone and the Klippen Zone, which are mechanically weak units composed by intercalation of limestones and deeply weathered materials. These conditions, along with the hydrological conditions, land use changes and other anthropogenic impacts contribute to the instability of the region. Consequently, in order to understand landslide processes and mechanisms, we attempt to integrate the hydro-mechanical data compiled from the monitoring sites to model a complex event triggered in 2013, in the Hofermühle catchment, district of Waidhofen an der Ybbs, in order to improve our understanding of landslide conditioning factors and triggering mechanisms of potential cascading hazards in the region.

How to cite: Jiménez Donato, Y. A., Carraro, E., Marr, P., Kanta, R., and Glade, T.: Unravelling the complex dynamic of slow-moving landslides in the Flysch zone region, Lower Austria. A case study of the Hofermühle catchment., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10159, https://doi.org/10.5194/egusphere-egu23-10159, 2023.

EGU23-10269 | Posters on site | NH3.11

Detecting Landslide Affected Areas Using Deep Learning of Bi-Temporal Satellite Imagery Datasets 

Fuan Tsai, Elisabeth Dippold, Po-Jui Huang, and Chi-Chuan Lo

Landslide is one of the most frequently occurred and destructive natural hazards in Taiwan and many other places around the world. Using satellite images to help identify landslide affected regions can be an effective and economic alternative comparing to conventional ground-based measures. However, utilizing remotely sensed images for the investigation and analysis of landslides still faces challenges. In a long-term monitoring of landslide affected areas, it is common to observe landslides occur repeatedly at or around the same region, thus requiring change-detection analysis of multi-temporal image datasets to identify this type (repeatedly occurred) landslides, especially to monitor its expansion. In recent years, machine learning techniques are extensively adopted for image analysis, including satellite images. Therefore, integrating change-detection with machine learning algorithms should be helpful for identifying and mapping incremental landslides from multi-temporal satellite images. This research developed a systematic deep learning framework for detecting landslides with bi-temporal satellite image pairs as the training datasets. The training datasets are extracted and labelled from multi-temporal high-resolution multi-spectral satellite images covering two watershed regions where landslides occurred frequently. Experimental results indicate that the developed machine learning algorithms can achieve high accuracies and perform better than conventional methods for detecting landslide affected areas from time-series satellite images, especially in the places where landslides may occur repeatedly.

How to cite: Tsai, F., Dippold, E., Huang, P.-J., and Lo, C.-C.: Detecting Landslide Affected Areas Using Deep Learning of Bi-Temporal Satellite Imagery Datasets, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10269, https://doi.org/10.5194/egusphere-egu23-10269, 2023.

Western Ghats (WG) of India is experiencing frequent landslides during every Indian summer monsoon. Due to the unique blend of topography and tropical humid climate, accelerates chemical weathering, forming a layer of unconsolidated soil unconformably overlies the Precambrian crystalline rock. Lack of cohesion or bonding in these contrasting geologic materials, makes WG vulnerable to various forms of landslides during the peak of Indian summer monsoon. Hence detailed information about soil thickness has a predominant role in identifying the landslide prone area and understanding the landslides in WG. However, soil thickness maps are not available for WG area and steep rugged terrain makes it difficult to collect detailed soil thickness data. This study used a random forest (RF) machine-learning model to predict the soil depth with a limited number of sparse samples in the Panniar river basin of WG. The model was combined using 70 soil depth observations with eleven covariates such as normalized difference vegetation index, topographic wetness index, valley depth, solar radiance, elevation, slope length, slope angle, slope aspect, convergence index, profile curvature and plan curvature. The results show that the RF model has good predictive accuracy with coefficient of determination (R2) of 0.822 and root mean square error (RMSE) of 2.968, i.e., almost 80% of soil depth variation explained. The spatially predicted soil depth map clearly shows regional patterns with local details. Both geomorphological processes and vegetation contributed to shaping the soil depth in the study area. The resulting map can be used for understating the soil characteristics and  modelling  the landslide susceptibility in the study area.

How to cite: Asokan Laila, A. and Gopinath, G.: Soil depth Prediction in a landslide prone tropical river basin under data-sparse conditions using machine-learning technique, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11135, https://doi.org/10.5194/egusphere-egu23-11135, 2023.

EGU23-13292 | Orals | NH3.11

Advances in landslide analysis by using remote sensing and artificial intelligence (AI): Results from MultiSat4SLOWS project 

Mahdi Motagh, Simon Plank, Wandi Wang, Aiym Orynbaikyzy, Magdalena Vassileva, and Mike Sips

Landslides are a major type of natural hazard that cause significant human and economic losses in mountainous regions worldwide. Optical and synthetic aperture radar (SAR) satellite data are increasingly being used to support landslide investigation due to their multi-spectral and textural characteristics, multi-temporal revisit rates, and large area coverage. Understanding landslide occurrence, kinematics and correlation to external triggering factors is essential for landslide hazard assessment. Landslides are usually triggered by rainfall and thus, are often covered by clouds, which limits the use of optical images only. Exploiting SAR data, and their cloud penetration and all weather measurement capability, provides more precise temporal characterization of landslide kinematics and its occurrence. However, except for a few research studies, the full potential of SAR data for operational landslide analysis are not fully exploited yet. This is a very demanding task, considering the availability of a vast amount of Sentinel-1 data that have been globally available since October 2014.

In this presentation we summarise all the achievements that were made within the framework of MultiSat4SLOWS project (Multi-Satellite imaging for Space-based Landslide Occurrence and Warning Service), financed within the Helmholtz Imaging 2020 call. The project aims on developing a multi-sensor approach for detection and analysis of the landslide occurrence time and its spatial extent using freely available SAR data from Sentinel-1. Within this project,  we generated a reference database based on Sentinel-1 and -2 data for training, testing and validation of deep learning algorithms. The reference database contains various landslide examples that occurred worldwide and include pre- and post-event polarimetric, coherence and backscatter features. Also, we investigated the applicability of SAR/InSAR time-series data for landslide time detection. Finally, we introduce a prototype of a Visual Analytics platform for rapid landslide analysis of spatial and temporal ground deformation patterns and correlation with external triggering factors.

 

How to cite: Motagh, M., Plank, S., Wang, W., Orynbaikyzy, A., Vassileva, M., and Sips, M.: Advances in landslide analysis by using remote sensing and artificial intelligence (AI): Results from MultiSat4SLOWS project, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13292, https://doi.org/10.5194/egusphere-egu23-13292, 2023.

EGU23-13333 | ECS | Orals | NH3.11

Geophysical mass flow over complex micro-topography: from grain-scale mechanics to continuum modeling 

Lu Jing, Shuocheng Yang, and Fiona C. Y. Kwok

Geophysical mass flows involve granular earth materials surging down natural slopes, one of the major threats to mountainous regions worldwide. Accurate modeling of geophysical mass flows requires closure relations both within the flow (rheology) and at the flow-substrate interface (boundary conditions). However, although recent years have seen significant advances in the modeling of granular flow rheology, our understanding of how flowing granular materials interact with the substrate remains largely elusive. Here, we focus on micro-topography, i.e., geometric base roughness that is about the same size as the grain size, and investigate its effects on the granular flow dynamics as well as the associated closure relations. To systematically vary the base roughness from smooth to rough, we generate the base using immobile particles with varying particle size and spatial arrangement in laboratory experiments (with particle image velocimetry for flow kinematics extraction) and discrete element method simulations. Two granular flow scenarios are considered, including steady-state flow down inclines and granular column collapse. In the first scenario, it is found that basal slip occurs when the base roughness is below a range of intermediate values and a general slip law connecting the slip velocity, the mean flow velocity, and the base roughness is developed. In the second, transient flow scenario, basal slip inevitably occurs even for very rough bases due to inertial effects and a transient basal slip law is proposed to correlate the slip velocity with local flow properties based on kinetic theory arguments. The basal slip laws developed in this work can be readily incorporated as a dynamic boundary condition in continuum modeling of granular flows. In future work, grain-scale mechanisms relevant to more realistic geophysical flows will be investigated, including the feedback effects of pore fluid pressure on the flow mobility during basal sliding and the role of irregular particle shapes in hydro-mechanical modeling of geophysical mass flows.

How to cite: Jing, L., Yang, S., and Kwok, F. C. Y.: Geophysical mass flow over complex micro-topography: from grain-scale mechanics to continuum modeling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13333, https://doi.org/10.5194/egusphere-egu23-13333, 2023.

EGU23-13523 | ECS | Posters on site | NH3.11

Automatic landslide detection using Sentinel-1 and -2 images - a glacial case study 

Alexandra Jarna Ganerød, Erin Lindsay, Ola Fredin, Tor-Andre Myrvoll, Steinar Nordal, Martina Calovi, and Jan Ketil Rød

Although Norway is a country with rough terrain and a high frequency instable steep slopes, there is a scarcity of landslide data available. This limits the accuracy of thresholds for early warning systems, and hazard maps, both of which rely on historic event data. There is great potential to supplement existing ground-based observations with automated landslide detection, using satellite imagery and deep learning. In working towards an automated system for landslide detection in Norway, we investigated which imagery types and machine-learning models performed best for detecting landslides in a formerly glaciated landscape.

We locally trained a deep learning model with the use of Keras, TensorFlow 2 and U-net architecture. As input data, we used multi temporal composites with Sentinel-1 and -2 image stacks of all available images from one month pre- and post-event. Processed bands included: dNDVI (difference in maximum normalised difference vegetation index) from Sentinel-2, and pre- and post-event Synthetic Aperture Radar (SAR) data (terrain-corrected, mean of multi-temporal ascending descending images, in VV polarisation) from Sentinel-1. Training and evaluation were performed with a well-verified landslide inventory of 120 manually mapped rainfall-triggered landslides from Jølster (30-July-2019), in Western Norway. We tested the model with four input data settings using different bands and various polarization for the pre- and post-event SAR data, including: 1) full version (all 13 bands) 2) dNDVI (Sentinel-2), preVV, postVV (Sentinel-1), 3) preVV, postVV (Sentinel-1), and 4) post-R, post-G, post-B, post-NIR, dNDVI (Sentinel-2). The results were compared to the results of a pixel-based conventional machine learning model (Classification and Regression Tree) using the same input data. The second input data setting provides the best results. The performance scores show precision results for all four input data settings between 80-85%, with Matthews corelation coefficient values from 51-89%. Moreover, the deep-learning model significantly outperforms the conventional machine learning model in the input data setting #3. We see that the patch-based classification method far out-performs the pixel-classification due to the ability to differentiate the landslide signal from random noise produced from speckle in undisturbed areas. In addition, this represents one of the first attempts to fuse SAR and optical data for landslide detection, and we show there is an advantage in doing so in this case.

 

How to cite: Ganerød, A. J., Lindsay, E., Fredin, O., Myrvoll, T.-A., Nordal, S., Calovi, M., and Rød, J. K.: Automatic landslide detection using Sentinel-1 and -2 images - a glacial case study, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13523, https://doi.org/10.5194/egusphere-egu23-13523, 2023.

Geophysical mass flows typically consist of a granular solid phase having a broad grain size distribution and an interstitial fluid phase. During the flow, particles of larger sizes tend to segregate in the flow and thereby accumulate in the flow surface and front, resulting in dramatic changes in the flow and deposition characteristics, such as enhanced runout distances and stratified deposit patterns. However, current hydro-mechanical modeling of geophysical mass flows often does not consider grain size segregation and the resulting internal heterogeneity of the flow, which can largely compromise the predictability of existing hydro-mechanical models. A major challenge lies in the multiscale nature of grain segregation and its effects on the flow mobility, which requires detailed characterization of segregation mechanics at both the particle and flow levels. Here, we first review recent advances in a multiscale framework in which the driving and resistive forces of segregation on a single intruder particle or a collection of large particles have been formulated based on discrete element method simulations and theoretical analysis. Then, we discuss how these particle-scale forces can be derived toward a continuum formulation for segregation flux modeling and be connected with the flow dynamics in a two-way coupling manner. These physics-based force formulations reflect the micromechanics of segregation and lead to enhanced predictive modeling of particle size dynamics in the granular flow. Finally, we discuss the potential of extending the proposed framework to consider the effects of interstitial fluids and other mechanisms in upscaled hydro-mechanical modelling for more realistic geophysical mass flows.

How to cite: Liu, M. and Jing, L.: Modelling grain size segregation in geophysical mass flows: bridging particle-level forces and continuum models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14147, https://doi.org/10.5194/egusphere-egu23-14147, 2023.

EGU23-14199 | Orals | NH3.11

“Fusion network with attention for landslide detection. Application to Bijie landslide open dataset” 

Candide Lissak, Thomas Corpetti, and Mathilde Letard

Remote sensing techniques are now widely spread for the early detection of ground deformation, implementation of warning systems in case of imminent landslide triggering, and medium- and long-term slope instability monitoring. The large breadth of data available to the scientific community, associated with processing techniques improved as the data volume was increasing, has led to noticeable developments in the field of remote sensing data processing, using machine learning algorithms and more particularly deep neural networks.

 

This arsenal of data and techniques is necessary for the present scientific challenges the community of researchers on landslides still have to meet. As landslides can be complex, for risk management and disaster mitigation strategies, it is necessary to have a precise idea of their location, shape, and size to be studied and monitored. The challenge aims to automate landslide detection and mapping, especially through learning methods. Machine learning methods based on Deep Neural Networks have recently been employed for landslide studies and provide promising efficient results for landslide detection [1].

 

In this study, we propose an original neural network for landslide detection. More precisely, we exploit a fusion network [1] dealing with optical images on the one hand and Digital Elevation Models on the other hand. To improve the results, attention layers [3] (able to stabilize the training and more precise results) as well as mix up techniques [4] (able to generalize more efficiently) are exploited.

The model was trained and tested on the open Bijie landslide dataset.

 

Keywords: Remote sensing for landslide monitoring and detection, landslide detection, deep neural networks, attention

 

[1] Ji, S., Yu, D., Shen, C., Li, W., & Xu, Q. (2020). Landslide detection from an open satellite imagery and digital elevation model dataset using attention-boosted convolutional neural networks. Landslides, 17(6), 1337-1352.

[2] Song, W., Li, S., Fang, L., & Lu, T. (2018). Hyperspectral image classification with deep feature fusion network. IEEE Transactions on Geoscience and Remote Sensing, 56(6), 3173-3184.

[3] Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48-62.

[4] Thulasidasan, S., Chennupati, G., Bilmes, J. A., Bhattacharya, T., & Michalak, S. (2019). On mixup training: Improved calibration and predictive uncertainty for deep neural networks. Advances in Neural Information Processing Systems, 32.

How to cite: Lissak, C., Corpetti, T., and Letard, M.: “Fusion network with attention for landslide detection. Application to Bijie landslide open dataset”, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14199, https://doi.org/10.5194/egusphere-egu23-14199, 2023.

EGU23-14546 | ECS | Orals | NH3.11

ML-based characterization of PS-InSAR multi-mission point clouds for ground deformation classification 

Claudia Masciulli, Michele Gaeta, Giorgia Berardo, Gianmarco Pantozzi, Carlo Alberto Stefanini, and Paolo Mazzanti

Persistent Scatterer Interferometry (PSI) is a powerful multitemporal A-DInSAR (Advanced Differential Synthetic Aperture Radar Interferometry) technique widely used for monitoring and measuring Earth’s surface displacements over large areas with sub-centimetric precision. The capability to detect ground deformation processes relies on the available PSI spatial density, strictly related to the resolution of the considered sensor and the presence of stable natural and artificial reflectors. A new data fusion approach, developed as part of the “MUSAR” project funded by ASI (Italian Space Agency), integrates multi-band SAR sensors to improve data coverage of PSI data by synthesizing multi-sensor displacement information. The integration of multi-mission PSI generates synthetic measurement points, named Ground Deformation Markers (GD-Markers), featuring vertical (Up-Down) and horizontal (Est-West) components of the displacements. The fusion of PSI data extracted by C-band Sentinel-1 images from the Copernicus initiative and the COSMO-SkyMed constellation in the X-band from ASI contributed to creating a dataset with high information content.

Each GD-Markers cluster with displacement measurements identifies a specific deformation process in the region of interest. After selecting the relevant cluster of points, the deformation processes were classified into different categories (e.g., landslide, subsidence) to improve their understanding and evaluation for mitigating natural-related hazards. This study aimed to develop a machine learning-based classification system, starting from GD-Markers point clouds, which support the automatization of ground displacement identification and characterization. The synthetic points were characterized as individual entities or point clouds, formed by a discrete cluster of points in space, to evaluate the advantage of treating each point independently or incorporating local neighborhood information. The structured point data were analyzed using a supervised Random Forest (RF) approach to evaluate the performance of point cloud classification and categorization for identifying the best initial setting. Each point was assigned a label representing a deformation process in point cloud classification, while one label is provided for the entire point cloud dataset with categorization.

Comparing models’ performances allowed the definition of the best possible approach for classifying the deformation processes observed by GD-Markers point clouds. The analysis assessed the effectiveness of the classification of single points or clusters to identify the optimal setup that achieves an accurate segmentation between adjacent deformation processes. Identifying this initial setting was essential for selecting and developing advanced deep-learning approaches.

How to cite: Masciulli, C., Gaeta, M., Berardo, G., Pantozzi, G., Stefanini, C. A., and Mazzanti, P.: ML-based characterization of PS-InSAR multi-mission point clouds for ground deformation classification, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14546, https://doi.org/10.5194/egusphere-egu23-14546, 2023.

EGU23-14639 | ECS | Posters on site | NH3.11

Performance analysis of a U-Net landslide detection model 

Itahisa Gonzalez Alvarez, Kathryn Leeming, Alessandro Novellino, and Sophie Taylor

Image segmentation algorithms are a type of image classifier that assigns a label to each individual pixel in an image. U-Nets, initially developed for the analysis of biomedical images and now widely used in a variety of fields, are an example of such algorithms. It has been shown that U-Nets are specially interesting when working with small training datasets and combined with data augmentation techniques.

In this study, we used satellite images with labelled landslide masks from known events to train a U-Net to identify areas of potential landslide. These landslide masks are time-consuming to create, resulting in a small initial training set. Even when working with U-Nets, the success of machine learning and AI tools depends on the availability and quality of training data, as well as the algorithm settings during the training process. Tuning machine learning models to achieve the best performance possible from limited amounts of data is important to generate trustworthy results that can be used to advance the knowledge of landslide events around the world.

Here, we show the differences in algorithm performance as we use different types of data augmentation and model parameters. We also explore and assess the effects on performance of options such as including different satellite bands, terrain information and alternative colour band representations.

How to cite: Gonzalez Alvarez, I., Leeming, K., Novellino, A., and Taylor, S.: Performance analysis of a U-Net landslide detection model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14639, https://doi.org/10.5194/egusphere-egu23-14639, 2023.

Steep slopes, deforestation, unconsolidated deposits, high annual rainfall, and a highly dissected landscape facilitate the occurrence of landslides in one of the most important Colombian highways “Via al Llano”, frequently causing traffic interruptions. Prior to a susceptibility assessment of the area, a multitemporal inventory is required. Usually, landslides are identified and mapped by visual interpretation of satellite optical and/or aerial images. However, in study areas located in tropical areas such as that of Via al Llano, due to the frequent presence of clouds, a number of images are needed to identify the landslides and estimate the period of their occurrence. Therefore, an automatic detection procedure is indispensable for large tropical areas and multitemporal event inventories. The cloud-based Google Earth Engine (GEE) allows geospatial processing of freely available multi-temporal data. In this work, we perform automatic detection of landslides using the Normalized Difference Vegetation Index (NDVI) from Sentinel-2 (optical images) and the SAR-backscatter change from Sentinel-1 (radar images) over a sector of the Buenavista area, extending for 53km2 in the south portion of the “Via al Llano”. Considering a period during which the occurrence of some landslides blocked the highway, images before and after this event were selected for automatic detection, and the results were compared with landslide inventory previously prepared by an expert operator by visual analysis of images available on Google Earth (optical-natural color images). To assess the ability of each method to discriminate between landslides and stable slopes, confusion matrices were calculated. The NDVI-based approach demonstrated an acceptable ability to identify the landslides, although generating a high number of false positives. On the other hand, the SAR-based method exhibits a lower ability to correctly detect the landslide polygons, even if generating a lower number of false positives. This is maybe due to the pattern of predicted positives which mostly consists of isolated pixels; conversely, the NDVI-based approach provides groups of adjacent pixels predicted as positives which better reproduce the shapes of the landslide polygons. Finally, by combining the two approaches and using topographic masks, better accuracy in the automatic mapping of our multitemporal landslide inventories was achieved.

How to cite: Calderon-Cucunuba, L. P. and Conoscenti, C.: Automatic mapping of multitemporal landslide inventories by using open-access Synthetic Aperture Radar and NDVI imagery in Google Earth Engine: a case study of the “Via al Llano” highway (Colombia), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15954, https://doi.org/10.5194/egusphere-egu23-15954, 2023.

EGU23-16166 | Posters on site | NH3.11 | Highlight

Numerical modelling of mudflows impacting settlements: a case study 

Alessandro Leonardi, Giulia La Porta, and Marina Pirulli

Mudflows are common natural hazards, often originating from the liquefaction of shallow landslides triggered by rainfall. The numerical back-analysis of past events is key in projecting the application of numerical models towards forward analysis. However, the complex multi-physics nature of the problem hampers the development of comprehensive frameworks. Notwithstanding, calibrated numerical models, able to simulate all aspects of the problem (triggering and runout) can still be valuable tools for aiding the design of countermeasures. This can currently only happen if calibration is performed on the specific site, or on sites with very similar geomorphological and geological characteristics.

In this presentation, the application of a coupled triggering and runout model is explored. Two study cases of well-known events occurring in Southern Italy are presented. A pseudo-plastic model is used for the post-triggering rheology. The resolution of the runout simulation is down to the level of the specific exposed element (houses, roads). This allows for an ad-hoc assessment of risk on key pieces of infrastructure. The results reveal interesting aspects related to how the complex topographic features of settlements challenge the traditional workflow for back-analysis. In particular, the channelization of flows within the settlement itself leads to an overestimation of hazard, unless care is placed to resolve the triggering phase down to the sub-basin scale.  

 

REFERENCES

Ng, C. W. W., Leonardi, A., Majeed, U., Pirulli, M., & Choi, C. E. (2023). A Physical and Numerical Investigation of Flow–Barrier Interaction for the Design of a Multiple-Barrier System. Journal of Geotechnical and Geoenvironmental Engineering, 149(1). https://doi.org/10.1061/(asce)gt.1943-5606.0002932

Pasqua, A., Leonardi, A., & Pirulli, M. (2022). Coupling Depth-Averaged and 3D numerical models for the simulation of granular flows. Computers and Geotechnics, January, 104879. https://doi.org/10.1016/j.compgeo.2022.104879

How to cite: Leonardi, A., La Porta, G., and Pirulli, M.: Numerical modelling of mudflows impacting settlements: a case study, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16166, https://doi.org/10.5194/egusphere-egu23-16166, 2023.

EGU23-16501 | Posters on site | NH3.11

Assessment of landslide susceptibility in the rocky coast subsystem of Essaouira, Morocco 

Sergio C. Oliveira, Abdellah Khouz, Jorge Trindade, Fatima ElBchari, Blaid Bougadir, Ricardo A. C. Garcia, and Mourad Jadoud

Several researchers have developed landslide susceptibility maps in recent years using a variety of methods and models. The Information Value method has frequently been used to assess landslide susceptibility in a variety of coastal environments. In this study we used these bivariate statistical techniques to assess the coastal region of Essaouira's susceptibility to landslides. 588 different landslides were found, classified, and mapped along the rocky coast of this coastal stretch. The observation and interpretation of many data sources, such as high-resolution satellite images, aerial photographs, topographic maps, and extensive field surveys, are employed to understand terrain predisposing conditions and to predict landslides. Essaouira's rocky coastal system is situated in the centre of Morocco's Atlantic coast. The study region was divided into 1534 (50 m wide) cliff terrain units. The landslide inventory was randomly split into two separate groups for training and validation purposes: 70% of the landslides were used for training the susceptibility model and 30% for independent validation. Elevation, slope angle, slope aspect, plan curvature, profile curvature, cliff height, topographic wetness index, topographic position index, slope over area ratio, solar radiation, presence of faulting, lithological units, toe lithology, presence and type of cliff toe protection, layer tilt, rainfall, streams, land-use patterns, normalized difference vegetation index, and lithological material granulometry were the twenty-two layers of landslide conditioning factors that were prepared. Using a pixel-based model (12.5 m x 12.5 m) and an elementary terrain unit-based model, the bivariate Information Value approach was used to determine the statistical link between the conditioning factors and the various landslide types and to produce the coastal landside susceptibility maps. The multiple coastal landslide susceptibility models were evaluated for accuracy and predictive power using the receiver operating characteristic curve and area under the curve. The findings allowed for the designation of 38% of the rocky coast subsystem as having a high susceptibility to landslides, with the majority of these areas being found in the southern part of the coastal region of Essaouira. Both future planned development operations and environmental conservation can benefit from these susceptibility maps.

Acknowledgements: The work has been financed by national funds through FCT (Foundation for Science and Technology, I. P.), in the framework of the project “HighWaters – Assessing sea level rise exposure and social vulnerability scenarios for sustainable land use planning” (EXPL/GES-AMB/1246/2021).

How to cite: Oliveira, S. C., Khouz, A., Trindade, J., ElBchari, F., Bougadir, B., Garcia, R. A. C., and Jadoud, M.: Assessment of landslide susceptibility in the rocky coast subsystem of Essaouira, Morocco, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16501, https://doi.org/10.5194/egusphere-egu23-16501, 2023.

EGU23-17563 | Orals | NH3.11 | Highlight

Impact of a debris flow surge on a vertical wall oblique with respect to flow direction 

Aronne Armanini, Alessia Fontanari, and Fabio Sartori

Debris flows are rapid to very rapid flows, made up of a high concentrated mixture of water and sediments. These types of flow are catastrophic natural phenomena affecting mountain areas and causing several property damages and loss of lives every year. The mitigation of these phenomena is then fundamental:  check dams and longitudinal protection walls are among the main structural passive countermeasures. A crucial aspect in the definition of the design criteria for these structures is the analysis of the impact force exerted by a debris flow on them.
From a scientific point of view, the state of the art in this field is quite lacking, despite the relevance of the topic. In the case of impact of a debris surge on a vertical plane normal to the flow direction, according to Armanini and Scotton (1992), two main types of impact may occur. The first type consists of a complete deviation of the flow along the vertical obstacle, assuming a jet-like behavior (Figure 1).  The second type is characterized by the formation of a reflected wave after the impact, which propagates upstream (Figure 2). The analytical solution based on momentum and mass balances in both case is already known (see Armanini 2009 and Armanini et al. 2020) and the comparison between theoretical results and experimental data are quite satisfactory. 
Much less studied is the case of the impact of a debris flow surge on a vertical wall, arranged in an oblique direction with respect to the flow direction, as in the case of lateral protection walls. 
In order to better understand its kinematic characteristics, the phenomenon  has been studied in the Hydraulic Laboratory of the University of Trento. The phenomenon has been reproduced in a channel of variable slope, by releasing a certain volume of fluid and measuring its impact force on a gate situated at the end of the channel at different oblique orientation with respect to flow direction. Several slopes of the channel and concentration of the solid fraction have been investigated. 
When the flow crash into the gate, it is deviated in the vertical direction along the obstacle and forms initially a vertical jet, which is soon deviated in the direction parallel to the gate.
The phenomenon has been theoretically investigated both in the light of the one-dimensional theory of fluid impacts already adopted for the case of impact on a vertical wall arranged orthogonally to the flow, and using a simplified approach derived from the classical two-dimensional theory of Ippen (1951) of the deviations of supercritical currents. The comparison between the predictions of the theory and the experimental data turns out to be quite good.

How to cite: Armanini, A., Fontanari, A., and Sartori, F.: Impact of a debris flow surge on a vertical wall oblique with respect to flow direction, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17563, https://doi.org/10.5194/egusphere-egu23-17563, 2023.

EGU23-295 | ECS | Posters on site | GM3.3

GIS-FSLAM-FORM: A QGIS plugin for fa t probabilistic susceptibility assessment of rainfall-induced landslides at regional scale 

Hongzhi Cui, Marcel Hürlimann, Vicente Medina, and Jian Ji

Landslide susceptibility analysis is the necessary procedure for timely discovering and locking potential sources of slope instabilities in natural terrain areas. The infinite slope model is broadly applied for evaluating the shallow landslide susceptibility coupling the geotechnical and geological parameters with a hydrological model. Because rainfall is one of the major factors inducing landslides, the calculation of the water table and pore water pressure is an important task in our approach. To assess appropriately the most susceptible areas, we propose a new framework for regional slope stability based on probabilistic analysis by combining a hydromechanical model, which couples the Fast Shallow Landslide Assessment Model (FSLAM) and reliability method. A user-friendly software based on the open-source geographic information system (QGIS) platform called the GIS-FSLAM-FORM plugin adopting the Python programming language was designed and developed. Accounting for the potential uncertainties of geotechnical parameters (in particular effective cohesion and friction of soil or root strength), the horizontal hydraulic conductivity, as well as the soil depth. Our now approach is emphasized for its simple hydrologic model and its high computation efficiency. To consider the probabilistic information of the FSLAM incorporating the infinite slope, the first-order reliability method (FORM) is presented during the analysis although inevitably involving iterative computing. The developed plugin using physically-based modelling can directly provide several regional hazard index distribution maps, such as the factor of safety (FoS), reliability index (RI), and failure probability (Pf).

How to cite: Cui, H., Hürlimann, M., Medina, V., and Ji, J.: GIS-FSLAM-FORM: A QGIS plugin for fa t probabilistic susceptibility assessment of rainfall-induced landslides at regional scale, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-295, https://doi.org/10.5194/egusphere-egu23-295, 2023.

Random Forest (RF) is a classification algorithm used successfully in geomorphological and hazard mapping (Sîrbu et al., 2019). It performs a defined number of classifications, based on decision trees, on random samples with replacement, from the original training data. Because of this, the algorithm is especially robust for errors and outliers in the training data and it is also very good in producing uncertainty estimates for the variability of results on each of the classified features. Its resulting data can also be used, with different methods, to produce a ranking of the independent variables used in the classification.

The present study was performed on a given data set, in central Italy, containing 7,360 slope units covering an area of 4,095 km2. The slope units are classified twice, based on different methodologies, into units with or without landslides. Also each slope unit has assigned 26 attributes that were used as independent variables (Alvioli et al., 2022). The slope units are treated as spatially independent from each other, and have been randomly split 70%-30%, into training and validation data respectively.

The model was setup as a computer code, in the R software environment. It uses different libraries to integrate the input data, run the algorithm, run a validation and measure the performance of the model and finally produce the output data. Most of the model settings were used with their default value, with the number of classification trees (ntree) being the only important setting that was fine tuned to a value of 1501 based on different model runs.

The results of the two classifications (one for each classification of the dependent variable) are relatively similar, proving once again the robustness of the RF algorithm when it comes to minor to medium changes in the input data. The first classification had an AUC (area under the curve) value of 0.829 compared with the AUC value of 0.817 for the second classification. For each classification, a ranking of the independent variables was produce, with the standard deviation of slope being the most important predictor. Other predictors with relative high importance were elevation and curvatures.

The results show that RF is an important classifier, which can be used with relatively low custom settings and on almost any data set in order to produce a reliable susceptibility map. Its integration with the R software makes it easy to run the whole process virtually automatic. The computer code for the model will be made freely available.

How to cite: Sirbu, F.: Landslide Susceptibility Model based on Random Forest classification, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-733, https://doi.org/10.5194/egusphere-egu23-733, 2023.

EGU23-2283 | ECS | Orals | GM3.3

Application of the LAND-SUITE software with a benchmark dataset for landslide susceptibility zonation 

Txomin Bornaetxea, Mina Yazdani, and Mauro Rossi

We propose the usage of LAND-SUITE software to carry out 16 landslide susceptibility models exploiting the benchmark dataset provided by the session organizers. The software allows the application of Linear Discriminat Analysis (LDA), Logistic Regression (LR) and Quadratic Discriminant Analysis (QDA) as statistical methods, together with the Combination Forecast Model (CFM), which combines the outputs of the former three methods. Each of the mentioned models has been applied considering the two provided different landslide presence variables (presence1 and presence2), resulting in 8 susceptibility maps that takes into account the complete set of explanatory variables. Then, we have taken advantage of the variables analysis outputs provided by LAND-SUITE, and the process has been repeated with a reduced set of 10 explanatory variable. The variables selection has been carried out following the principles of independence between the explanatory variables, and trying to optimize the contribution of each of them to the model performance, for which leave-one-out tests and significance p-value of the LR outputs have been consulted. Results show a slight, but generalized, improvement of the model performances when the presence2 dataset is used, against the presence1. The model performance is also maintained or very sensitively decreased when the amount of explanatory variables is reduced from 26 to 10. However, the Area Under the ROC Curve (AUC) ranges between 0.75 and 0.82 in any of the tests. In addition, 9 out of the 10 selected variables are the same for both presence1 and presence2 tests. Uncertainty associated to each of the models has been also computed by means of the bootstrap resampling method.

How to cite: Bornaetxea, T., Yazdani, M., and Rossi, M.: Application of the LAND-SUITE software with a benchmark dataset for landslide susceptibility zonation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2283, https://doi.org/10.5194/egusphere-egu23-2283, 2023.

EGU23-3566 | ECS | Posters on site | GM3.3 | Highlight

Landslide Susceptibility within the binomial Generalized Additive Model 

Marco Loche, Massimiliano Alvioli, Ivan Marchesini, and Luigi Lombardo

We develop a slope-unit based landslide susceptibility model using the benchmark dataset proposed in the session, located in Central Italy. As a result, we produce two susceptibility maps based on the two different landslide presence attribute fields included in the dataset.

The proposed dataset is a subset of a much larger one, recently used to obtain landslide susceptibility all over Italy. We further explore the differences between results obtained from the proposed dataset, and landslide susceptibility obtained at national scale. The national scale results were obtained in a Bayesian version of a binomial Generalized Additive Model (GAM) in R-INLA, an R implementation of the integrated nested Laplace approximation for approximate Bayesian inference. The method can explain the spatial distribution of landslides using a family of Bernoulli exponential functions.

This allows us to estimate fixed effects and random effects, and to assess their associated uncertainty. The residual susceptibility maps and the most common correlations permit to measure the strength and direction of the relationships between models and to capture differences in susceptibility values across the study area. On their basis, we offer a convenient approach to evaluate the similarities in case of both represented landslide distributions.

We propose this modeling comparison for any susceptibility maps to evaluate the interpretability of the covariates and performances, where a large dataset may influence the susceptibility pattern over space.

How to cite: Loche, M., Alvioli, M., Marchesini, I., and Lombardo, L.: Landslide Susceptibility within the binomial Generalized Additive Model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3566, https://doi.org/10.5194/egusphere-egu23-3566, 2023.

EGU23-4851 | Posters virtual | GM3.3

Resolution of data, type of inventory and data splitting in machine learning-based landslide susceptibility mapping 

Neelima Satyam, Minu Treesa Abraham, and Kunal Gupta

The use of machine learning (ML) approaches for developing landslide susceptibility maps (LSM) has gained wide popularity in the recent past. The choice of ML algorithms, spatial resolution, the ratio of train-to-test data, and the landslide conditioning factors are some of the crucial factors that decide the performance of the developed LSM. However, there are no formal guidelines on the selection of any of these factors, as the choice highly depends upon the study area. In most cases, site-specific comparative analysis are required to find the best-suited combination. Two case studies were conducted for parts of the Western Ghats in India to develop pixel-based LSM for Idukki and Wayanad districts. Five different ML algorithms, two different spatial resolutions, multiple train-to-test ratios and two different types of landslide inventory data were used for developing the best-suited LSM. After detailed analysis, it was observed that the random forest (RF) algorithm has resulted in the best-performing LSM for both regions. The effects of spatial resolution and data splitting were found to be different for different algorithms, and among all the factors considered, data splitting is found to be the least influencing factor. 

How to cite: Satyam, N., Abraham, M. T., and Gupta, K.: Resolution of data, type of inventory and data splitting in machine learning-based landslide susceptibility mapping, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4851, https://doi.org/10.5194/egusphere-egu23-4851, 2023.

EGU23-5755 | ECS | Orals | GM3.3

A slope units based landslide susceptibility analyses using Weight of Evidence and Random Forest 

Marko Sinčić, Sanja Bernat Gazibara, Martin Krkač, Hrvoje Lukačić, and Snježana Mihalić Arbanas

As identified by previous work, landslides present a significant hazard in the Umbria Region, Central Italy. We present a Weight of Evidence (WoE) and Random Forest (RF) approach for deriving landslide susceptibility maps (LSMs) for the defined slope units (SU) cartographic unit. Used input data in this study includes a layer containing 7360 SU with 26 landslide conditioning factors (LCFs) and two landslide presence flags. Namely, „presence1“ (P1) and „presence2“ (P2) describe 3594 and 2271 SU as unstable, respectively. LCFs were reclassified using Natural Breaks into 10 classes, followed by testing collinearity which resulted in selecting 11 for the further analyses. Unstable SU were randomly split in two equal sets, one for deriving LSMs, and the other for validation. Using only unstable SU for WoE, the landslide dataset applied in RF included additionally an equal amount of stable SU. Stable SU were randomly selected from the area which had excluded only the previously selected unstable SU, simulating a temporal inventory for landslide validation. The latter ensured application of the model to unseen data, as well as unbiased landslide dataset for training the model. Model evaluation and LSM validation included determining Area Under the Curve (AUC) for the LSM area defined with Cumulative percentage of study area in susceptibility classes and the Cumulative percentage of landslide area in susceptibility classes. For model evaluation, 50% of unstable SU were examined, whereas to validate it, the remaining 50% of unstable SU were used. For model classification parameters, all SU were used to define Overall Accuracy (OA) and a Hit Rate and False Alarm Rate curve for which AUC was calculated. RF model performed excellent, having 86.16 and 90.00 AUC values for P1 and P2 scenarios, respectively. Significantly worse, the WoE P1 and P2 scenarios have 62.09 and 69.41 AUC values, respectively. LSM validation on unseen data goes in favor of WoE with 60.46 (P1) and 66.17 (P2) AUC values, compared to 45.06 (P1) and 56.68 (P2) AUC values for RF, indicating a random guess prediction. Considering OA and AUC as classification parameters, OA values for P1 and P2 scenarios in RF are 74.36 and 77.60 whereas AUC values are 81.65 and 84.61. Significantly less, WoE method has 66.03 and 69.14 OA values for P1 and P2 scenario, respectively. Similarly, WoE AUC values for P1 is 74.09 whereas for P2 it is 77.07. Showing better results in all four studied parameters in both methods, we point out the P2 scenario as a better option for defining landslide datasets concerning the amount of unstable and stable SU. Due to having a relatively big portion of unstable SU in the input data we argue that classification parameters should be prioritized when choosing the optimal method and scenario, as they take to consideration both unstable and stable SU for the entire study area. Based on the conducted research, we suggest using RF due to better classification performance as an approach for landslide susceptibility analyses and future zonation in the study area.

How to cite: Sinčić, M., Bernat Gazibara, S., Krkač, M., Lukačić, H., and Mihalić Arbanas, S.: A slope units based landslide susceptibility analyses using Weight of Evidence and Random Forest, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5755, https://doi.org/10.5194/egusphere-egu23-5755, 2023.

EGU23-6053 | ECS | Posters on site | GM3.3

Landslide Susceptibility Mapping via binomial Generalized Additive Model 

Gianvito Scaringi and Marco Loche

Developments of geostatistical models in landslide susceptibility mapping often do not consider interpretability, although this element has a reasonably fundamental importance on risk assessment. Last trends in machine learning demonstrate that enhancement of performances influences the interpretability of mechanical processes in geostatistical models, in which geomorphic causation is suddenly lost.

We took the benchmark dataset in central Italy as our study case, for which a complete inventory of landslides is available. We built two landslide susceptibility models using a Generalised Additive Model (GAM) with a slope-unit partitioning of the area (~4,100 km2, comprising 7,360 slope units), and a set of 26 independent variables, with the aim of classifying the presence/absence of landslides.

We tested the capability of a binomial GAM through nonparametric smoothing functions to evaluate the interpretability of the covariates. Furthermore, we obtained satisfactory results in terms of performance with a reasonable compromise in the interpretability.

GAMs are very popular classifiers in landslides susceptibility and even though other methods yield better performance, we suggest that interpretability in geostatistical analyses should proceed in tandem with improving the models’ performances.

How to cite: Scaringi, G. and Loche, M.: Landslide Susceptibility Mapping via binomial Generalized Additive Model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6053, https://doi.org/10.5194/egusphere-egu23-6053, 2023.

Rain-induced natural terrain landslides are the most frequent geo-hazard in many regions of the world. As an essential tool in addressing rising landslide challenges due to climate change, landslide susceptibility assessment has been widely investigated in Hong Kong for over twenty years. However, a public dataset for Hong Kong landslide susceptibility assessment is currently absent in the geoscience research community, which brings difficulties in establishing consistent evaluation criteria for testing any new method or theory. Thus, to facilitate the development of new statistical and/or artificial intelligence-based methods for landslides susceptibility assessment, here we compile the first version of The Hong Kong University of Science and Technology – Landslide Susceptibility Dataset (HKUST-LSD) based on multiple sources of open data. Aiming at comprehensively describing the rain-induced natural terrain landslide conditioning factors in Hong Kong, HKUST-LSD v1.0 comprises data of (a) a landslide inventory; (b) a high-resolution digital terrain model (DTM) and its topographical derivatives; (c) superficial geology, distance to faults and rivers/sea; (d) historical maximum rolling rainfall and (e) ground vegetation condition. HKUST-LSD v1.0 provides a ready-to-use dataset that includes processed landslide and non-landslide samples, together with reference codes that utilized representative machine learning techniques to assess the landslide susceptibility in Hong Kong and achieved satisfactory performance. The dataset will be updated on a regular basis to fulfil the latest research needs that might arise in the research community and support global sustainable development.

Download the dataset at: https://github.com/cehjwang/HKUST-LSD

How to cite: Wang, H., Zhang, L., and Wang, L.: HKUST-Landslide Susceptibility Dataset (HKUST-LSD): A benchmark dataset for landslide susceptibility assessment in Hong Kong, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6259, https://doi.org/10.5194/egusphere-egu23-6259, 2023.

High-magnitude earthquakes are often in seismic zones that initiate the cascading chain of hazards such as co-seismic landslides, soil liquefaction, snow avalanche, surface faulting, devastating rock avalanches, and ground shaking. In the present study, a co-seismic landslide susceptibility analysis was executed for the Bhagirathi valley of Uttarakhand Himalayan region using machine learning techniques based on the slope unit-based method. The study area falls in seismic zone IV, rocks along the fault zone are fragile, and this area is very active seismically. This region has previously experienced Uttarkashi earthquake (1991) of magnitude 6.6. Assessment of seismic induced landslide is considered a complex process, as it considers both static parameters (causative factors) and dynamic parameters (triggering factor) in the form of ground motion shaking effects. In this study, the co-seismic landslide susceptibility maps using the machine learning approach Extreme Gradient Boosting (XgBoost) and Naïve Bayes (NB) techniques have been carried out at Slope Unit-based mapping level. The landslide inventory with 3,000 delineated polygons has been classified into training (80%) and testing (20%) data to calibrate and authenticate the models. For this purpose, static causative factors have been considered, such as slope, aspect, curvature, lineament buffer, drainage buffer, geology, topographic wetness index, and normalized difference vegetation index (NDVI), these parameters have been generated using the CartoDEM and satellite data. Triggering factors Arias Intensity (AI) has been considered for ground motion shaking as a dynamic factor for co-seismic landslides susceptibility mapping. Arias Intensity was prepared using the classical Cornell approach by considering the earthquake catalogue between the years 1700 and 2022. Finally, XgBoost and NB techniques have been used to compute static landslide susceptibility mapping and dynamic co-seismic landslide susceptibility map for a 475-year return period. XgBoost methods at the slope unit level predicted better results. These results were validated using the seismic relative index (SRI) and landslide density method. The prepared map can be effectively helpful for local and regional planning.

 

Keywords: Co-seismic landslide, Slope Unit, Landslide mapping, Machine learning.

 

How to cite: Gupta, N., Kanungo, D. P., and Das, J.: Co-seismic landslide susceptibility analysis for the Bhagirathi valley of Uttarakhand Himalayan region using machine learning algorithms based on Slope unit techniques, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6937, https://doi.org/10.5194/egusphere-egu23-6937, 2023.

The aim of this study is to contribute to the introduction of a benchmark dataset for landslide susceptibility. The contribution consists in the application of Generalized Additive Models (GAMs) on the test area proposed by Alvioli et al. (2022), located in Central Italy (Umbria Region, 4095 km2), and over the Mountain Communities of Mont Cervin and Mont Emilius (670 km2), located in the central part of Valle d’Aosta Region. In the latter, previous studies regarding landslide susceptibility were carried out by Camera et al. (2021) and Bajni (2022).

The susceptibility analysis is based on slope units for both areas and it uses the open-source dataset available for Italy (https://geomorphology.irpi.cnr.it/tools/slope-units, Alvioli et al., 2020). For Central Italy, predictors and response variable are those made available by Alvioli et al. (2022). For consistency, for Valle d’Aosta morphometric variables were calculated from the EUDEM digital elevation model (Copernicus Land Monitoring Service, 25 m horizontal resolution), while soil-related variables – namely soil depth, soil bulk density and particle size fractions - were derived from the SoilGrid global dataset (Hengl et al. 2017). In addition, coherently with Alvioli et al. (2022), two presence/absence landslide response variables (‘1’/’0’) were defined. For the first one, ‘presence1’, a slope unit was considered impacted by landslides (‘1’) if at least an event was recorded within its limits. For the second one, ‘presence2’, a slope unit was considered impacted by landslides (‘1’) if two or more landslides occurred within its limits. For Valle d’Aosta, landslide events were accessed through the regional inventory (http://catastodissesti.partout.it/), which is updated continuously by the Regional Civil Protection Department and the Forest Corps through regular surveys or following warnings from citizens.

Two landslide susceptibility maps were calculated for each area (‘presence1’, ‘presence2’). GAMs were applied through the mgcv library of R, with and without the option of variable selection through shrinkage. In addition, predictors behavior was analyzed through the associated Component Smoothing Functions (CSF) to check for physical plausibility. Finally, to evaluate uncertainties, a non-spatial k-fold cross-validation was carried out and a model evaluation was performed based on contingency tables, area under the receiver operating characteristic curve (AUROC) and variable importance (decrease in explained variance).

By the application of the same modelling algorithm (GAM) with an input dataset derived from the same data sources, the study is expected to verify the consistency of the obtained landslide susceptibility results in terms of both model performance and main driving processes (predictors).

References

Alvioli et al., 2020. Parameter-free delineation of slope units and terrain subdivision of Italy. Geomorphology 258, 107124. https://doi.org/10.1016/j.geomorph.2020.107124

Alvioli et al., 2022. Call for collaboration: Benchmark datasets for landslide susceptibility zonation. https://doi.org/10.31223/X52S9C

Bajni, 2022. Statistical methods to assess rockfall susceotibility in an Alpine environment: a focus on climatic forcing and geomechanical variables. https://doi.org/10.13130/bajni-greta_phd2022-03-23

Camera et al., 2021. Introducing intense rainfall and snowmelt variables to implement a process-related non-stationary shallow landslide susceptibility analysis. Science of The Total Environment 147360. https://doi.org/10.1016/j.scitotenv.2021.147360

Hengl et al., 2017. SoilGrids250m: Global gridded soil information based on machine learning. PLoS one 12, e0169748. https://doi.org/10.1371/journal.pone.0169748

How to cite: Camera, C. and Bajni, G.: Comparison of the effectiveness of application of GAMs for landslide susceptibility modelling in Apennine and Alpine areas, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7907, https://doi.org/10.5194/egusphere-egu23-7907, 2023.

EGU23-9623 | ECS | Posters on site | GM3.3

Can AI-generated landslide inventories replace humans' cognitive abilities in hazard and risk scenarios? 

Sansar Raj Meena, Mario Floris, and Filippo Catani

Landslide inventories are quintessential for landslide susceptibility mapping, hazard modeling, and risk management. Experts and organizations all across the world have preferred manual visual interpretation of satellite and aerial imagery for decades. However, there are other issues with manual inventory, such as the subjective process of manually extracting landslide boundaries, the lack of sharing landslide polygons within the geoscientific community, and the amount of time and effort engaged in the inventory generation process by the expert interpreters. To address these challenges, a large amount of research on semi-automated and automatic mapping of landslide inventories has been conducted in recent years. The automatic development of landslide inventory using Artificial Intelligence (AI) approaches is still in its early stages, as there is currently no published study that can generate a ground truth representation of a landslide situation following a landslide-triggering event. In terms of landslide boundary delineation utilizing AI-based models, the evaluation metrics in recent research suggest a range of 50-80% of the F1-score. However, with the exception of those using model evaluation testing in the same studied area, very few studies claim to have attained more than 80% F1 score, that too at larger scales of investigation. As a result, there is currently a research gap between the generation of AI-based landslide inventory and their applicability for landslide hazard and risk assessments. There is a need to advocate for the geoscientific community to check the reliability of AI-generated landslide data in terms of their usage in the succeeding phases of landslide response and mitigation in impacted areas.

How to cite: Meena, S. R., Floris, M., and Catani, F.: Can AI-generated landslide inventories replace humans' cognitive abilities in hazard and risk scenarios?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9623, https://doi.org/10.5194/egusphere-egu23-9623, 2023.

EGU23-9988 | Posters virtual | GM3.3

Comparing the performance of Machine Learning Methods in landslide susceptibility modelling 

Paraskevas Tsangaratos, Ioanna Ilia, and Aikaterini-Alexandra Chrysafi

Landslide phenomena are considered as one of the most significant geohazards with a great impact on the man-made and natural environment. If one search the scientific literature, the most studied topic in landslide assessments is the identification of areas that potentially may exhibit instability issues by modelling the influence of landslide-related variables with methods and techniques from the domain of knowledge and data-driven approaches. This is not an easy task, since the complexity, and in most cases the unknown processes that are responsible for the evolution of landslide phenomena triggered either of natural or man-made activities, influence their performance. Landslide susceptibility assessments, which models the spatial component of the evolution of landslides are the most reliable investigation tool capable of predicting the spatial dimension of the phenomenon with high accuracy. During the past two decades, artificial intelligence methods and specifically machine learning algorithms have dominated landslide susceptibility assessments, as the main sophisticated methods of analysis. Fuzzy logic algorithms, decision trees, artificial neural networks, ensemble methods and evolutionary population-based algorithms were among the most advanced methods that proved to be reliable and accurate.

In this context, the main objective of the present study was to compare the performance of various Machine Learning models (MLm) in landslide susceptibility assessments. Concerning the followed methodology, it could be separated into a five-phase procedure: (i) creating the inventory map, (ii) selecting, classifying, and weighting the landslide-related variables, (iii) performing a multicollinearity, an importance analysis (iv) implementing the developed methodology and testing the produced models, and (v) comparing the predictive performance of the various models. The computational process was carried out coding in R and Python language, whereas ArcGIS 10.5 was used for compiling the data and producing the landslide susceptibility maps.

In more details, Logistic Regression, Support Vector Machines, Random Forest, and Artificial Neural Network were implemented, and their predictive performance were compared. The efficiency of the MLM was estimated for an area of northwestern Peloponnese region, Greece, an area characterized by the presence of numerous landslide phenomena. Twelve landslide-related variables, elevation, slope angle, aspect, plan and profile curvature, topographic wetness index, lithology, silt, sand and clay content, distance to faults, distance to river network and 128 landslide locations, were used to produce the training and test datasets. The Certainty Factor was implemented to calculate the correlation among the landslide-related variables and to assign to each variable class a weight value. Multi-collinearity analysis was used to estimate the existence of collinearity among the landslide related variables. Learning Vector Quantization (LVQ) was used for ranking features by importance, whereas the evaluation process involved estimating the predictive ability of the MLm via the classification accuracy, the sensitivity, the specificity and the area under the success and predictive rate curves (AUC). Overall, the outcome of the study indicates that all MLm provided high accurate results with the Artificial Neural Network approach being the most accurate followed by Random Forest, Support Vector Machines and Logistic Regression. 

How to cite: Tsangaratos, P., Ilia, I., and Chrysafi, A.-A.: Comparing the performance of Machine Learning Methods in landslide susceptibility modelling, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9988, https://doi.org/10.5194/egusphere-egu23-9988, 2023.

Numerous advanced techniques including machine learning models are widely used in landslide susceptibility zoning which result in very high accuracy. In some cases, very high accuracy represents an overfitting in the model, where a model adapts very well to the training data but poorly for the test or new data.  Cross Validation (CV) strategies are often employed to reduce overfitting in a machine learning model. Several cross validation techniques have been developed recently as a part of machine learning workflow.  However, the preference of choosing one cross validation method to another is still unclear in landslide susceptibility zoning. To illustrate this issue, the authors reproduce non CV, standard V-fold CV, and several spatial CV techniques using a benchmark dataset in Italy to train, validate and test an XgBoost model using 26 landslide controlling factors. The variation of RoC validation, RoC testing, and confusion matrix were used to detect the potency of model overfitting. The preference of using a CV technique for a benchmark data in Italy will be discussed further. The result is expected to provide guidance for choosing CV technique in landslide susceptibility zoning based on slope unit and machine learning workflow.

How to cite: Samodra, G., Wahyudi, E. E., and Susyanto, N.: Cross validation technique preference for landslide susceptibility zoning based on slope unit and machine learning workflow, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11051, https://doi.org/10.5194/egusphere-egu23-11051, 2023.

Bayesian logistic regression with vague priors and optimized XGBoost models are two contrasting and commonly used approaches for modeling landslide susceptibility. Logistic regression calculates the log odds of a binary outcome (i.e., landslide or no landslide) given some predictor data (e.g., slope, elevation, and geology) that describes the terrain of each mapping unit used to divide the terrain for susceptibility evaluation. The Bayesian implementation incorporates uncertainty into the model by using probability distributions of the model parameters. Weakly informative priors ensure that the likelihood function (i.e., observational data) dominates posterior distributions, which can be estimated using the statistical software Stan. Like logistic regression, the gradient boosting decision tree machine learning algorithm XGBoost requires the predictor data of each mapping unit to output a probability of an event. Decision trees are a non-parametric learning tool that uses a set of if-then-else decision rules to predict the expected model outcome. Gradient boosting is a method of sequentially adding more decision trees to improve the model output until the lowest model residual levels are reached while penalizing for the level of complexity added to the model. We optimize the model parameters using a Bayesian cross-validation procedure on a portion of the training data. To obtain distributions of the level of susceptibility from XGBoost, a 10-fold cross-validation procedure with ten iterations is implemented. Evaluation of both Bayesian logistic regression and XGBoost algorithms is performed using the area under the curve of the receiver operator characteristics and the Brier score, but any other common metric for evaluation is possible. Model development and evaluation is carried out through the computational environment R. These methods have been applied with success to many diverse regions of the United States and would benefit from testing with the benchmark datasets proposed by the conveners.

How to cite: Mirus, B. and Woodard, J.: Bayesian logistic regression and optimized XGBoost models for landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11586, https://doi.org/10.5194/egusphere-egu23-11586, 2023.

Grid cells (GC) and slope units (SU) are the most common mapping units in landslide susceptibility modeling. SU-based models have recently gained popularity in the field because of the availability of user-friendly software and certain advantages over GC approaches. For example, SUs are often described as more geomorphologically meaningful, less sensitive to positionally inaccurate landslide data and more flexible in representing specific variables (e.g., binary vs. count responses). In contrast to GCs, SU sizes can vary considerably within a study area. Spatially varying mapping unit sizes may be accompanied by a spatially varying likelihood of a SU being affected by a landslide. We assume that larger SUs are more likely to be labeled as "landslide-affected" than smaller SUs, which are just as susceptible to landslides simply because of their larger spatial extent. In other words, the larger the area of investigation, the more likely a landslide can be found. This may have relevant effects on subsequent landslide susceptibility models, especially if certain predictor variables correlate with SU sizes.

To our knowledge, the effects of different SU sizes on landslide susceptibility models have rarely been investigated, and no approaches to explicitly consider SU size have yet been presented. In this contribution, we use Generalized Additive Mixed Models (GAMM) to confront four different strategies for dealing with spatially varying SU sizes in landslide susceptibility modeling. The analyses focus on the provided SU-based dataset related to a part of the Umbria region in Central Italy (~4,100 km²). In the first strategy, all predisposing factors, including those directly related to SU size (i.e., SU area and distance/SU area), are used for model fitting and spatial prediction. The second strategy builds upon strategy 1, but it does not consider the size of the SUs for model fitting and spatial prediction. The third strategy demonstrates the ability of SU size to discriminate SUs with landslides from those without landslides and consists of a single-variable model with the area of the SUs as its only predictor. Then, in the fourth strategy, all predictors are used for model fitting, but the effect of SU size is averaged out from the spatial prediction (i.e., the size effect is not predicted into space, but its potentially confounding effect is isolated during the model fitting).

The first tests support the assumption that larger SUs are more likely labeled as landslide-affected SUs and that associated confounding effects should be considered in landslide susceptibility modeling. We present the four strategies in terms of modeled relationships, relative variable importance, spatial prediction pattern and quantitative validation results.

How to cite: Moreno, M. and Steger, S.: Slope unit size matters - why should the areal extent of slope units be considered in data-driven landslide susceptibility models?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12943, https://doi.org/10.5194/egusphere-egu23-12943, 2023.

EGU23-13362 | Orals | GM3.3

Exploring the benchmark dataset for tasks related to landslide susceptibility assessment 

Jewgenij Torizin and Nick Schüßler

In the presented study, we investigate the possibilities of performing tasks related to landslide susceptibility assessment (LSA) on the provided benchmark dataset. The slope unit-based dataset consists of aggregated predisposing factors and two label sets. Although initially introduced as a dataset for binary classification tasks, it is also suitable for zoning and regression analysis in combination with the underlying landslide inventory. Zoning ranks slope units to delineate the study area in susceptibility zones. In the regression analysis, we try to predict a numeric target value (e.g.,  landslide count) by the slope unit's attributes.

We explored the benchmark dataset using bivariate and multivariate statistical visualization techniques to understand the data relations better. We found the dataset at this stage insufficient for achieving a well-explainable high-performance classification using linear models. Most attributes are not specific to linearly separate the given labels. The chosen central tendency statistics (mean and standard deviation) may not characterize the parameter distributions inside the slope unit sufficiently.

We propose a theoretical concept for zonation analysis to assess the best possible performance on the given discrete dataset using the success rate curve as the model evaluation metric. Because any applied algorithm cannot modify the geometry of the discrete slope units, the evaluation metric only depends on the relative ranking of slope units. The best performance is obtainable without computing a predictive model. For frequency-related models (weighting of factors with landslide count statistics), a simple direct computation of conditional probabilities or frequency ratio on the slope units as a ranking factor provides the best possible ranking. Combining the label and slope unit's area provides the best slope unit ranking for binary labels.

We conducted a regression and classification analysis with artificial neural networks (ANN) testing different combinations of parameters (sensitivity analysis) architectures allowing for modeling nonlinear relations. In both analyses, initial results show that a complex net architecture can boost the model fit on the training dataset by losing predictive performance on test data. Also, the dataset pre-exploration corresponds well with the sensitivity analysis with ANN. The number of parameters is reducible to few effective predictors without losing much accuracy in classification, which is poor-to-moderate depending on the utilized label set.

While slope units as an aggregation for geomorphological analyses remain undisputed, the proposed aggregation of predisposing factors in slope units at the analysis's entry point needs further discussion. Aggregating the results of a raster-based LSA to overcome deviances in landslide susceptibility patterns caused by data uncertainties or different methods could be more suitable at this point. Slope units should be analyzed with regression analysis in LSA to consider their different spatial extents during the calculation.

We provide our scripts, visualizations, and results as a Jupyter Notebook on our GitHub: https://github.com/BGR-EGHA/EGU23_GM3.3_ls_benchmark.

How to cite: Torizin, J. and Schüßler, N.: Exploring the benchmark dataset for tasks related to landslide susceptibility assessment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13362, https://doi.org/10.5194/egusphere-egu23-13362, 2023.

EGU23-16251 | Orals | GM3.3

Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy 

Héctor Aguilera, Jhonatan Steven Rivera Rivera, Carolina Guardiola-Albert, and Marta Béjar-Pizarro

In response to the call for collaboration, we aim to develop landslide susceptibility maps for the benchmark study area using Ensemble Machine Learning. Ensemble Learning has proven succesful for landslide susceptibility mapping in highly susceptible Asian regions of South Korea (Kaavi et al., 2018) and China (Hu et al., 2020).

The benchmark dataset provided, encompassing 7360 slope units in the central region of Italy, has 26 morphometric and thematic attributes, and two binary targets indicating the presence (1) or absence (0) of landslides. The first binary variable is balanced with respect to the number of zeros and ones (target 1) and the second in terms of the area covered by slope units labeled either with zero or one (target 2). For each of the two conditions in the dataset, we will compare the performance of individual classifiers such as logistic regression, naive bayes, decision trees, k-nearest neighbors, support vector machine, neural networks, as well as bagging (e.g., random forest) and boosting (e.g., extreme gradient boosting, CatBoost) algorithms using cross-validation. Then the best most diverse models will be selected based on typical performance metrics such as AUC and Matthews Correlation Coefficient (MCC), fine-tuned, and combined using stacking and blending Ensemble Learning techniques.

The best model will be re-trained with different configurations of training and test sets to derive a distribution of errors to add a measure of uncertainty in each slope unit of landslide susceptibility maps. Further, we will develop a landslide susceptibility index based on the results (e.g., probability distributions of the outcomes) to represent quantile-based susceptibility maps.

This work has been developed thanks to the pre-doctoral grant for the Training of Research Personnel (PRE2021-100044) funded by MCIN/AEI/10.13039/501100011033 and by "FSE invests in your future" within the framework of the SARAI project "Towards a smart exploitation of land displacement data for the prevention and mitigation of geological-geotechnical risks" PID2020-116540RB-C22 funded by MCIN/AEI/10.13039/501100011033.

How to cite: Aguilera, H., Rivera Rivera, J. S., Guardiola-Albert, C., and Béjar-Pizarro, M.: Ensemble learning on the benchmark dataset for landslide susceptibility zonation in Central Italy, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16251, https://doi.org/10.5194/egusphere-egu23-16251, 2023.

EGU23-1059 | ECS | Orals | EMRP1.6

Seismic reflectivity of fractures: the impact of secondary connected fractures 

Edith Sotelo, J. German Rubino, Nicolas D. Barbosa, Santiago G. Solazzi, and Klaus Holliger

Fractures are ubiquitous through out the Earth's upper crust and dominate the mechanical and hydraulic properties of the affected rock masses. Indeed, open fractures act as fluid conduits and, commonly, flow is controlled by larger fractures, which are, in turn, likely to be connected to smaller ones. Therefore, fracture characterization is of paramount importance for many pertinent applications, such as geothermal energy production, CO2 sequestration, nuclear waste storage, and hydrocarbon exploration. Seismic reflection methods are useful tools for fracture characterization due to the generally high reflectivity that large fractures exhibit as a consequence of their strong mechanical contrast with the embedding intact background. The magnitude of this mechanical contrast is known to be strongly affected by fracture-to-background wave-induced fluid pressure diffusion (FPD). Conversely, the FPD effects associated with secondary connected fractures remain so far unexplored. We investigate the influence of FPD on the normal compliance and on the vertical incidence PP reflectivity of a large fracture that is hydraulically connected to smaller fractures. To this end, we use several models that consist of an infinite horizontal main fracture connected to multiple vertical secondary fractures of finite length. This fracture system is embedded in impermeable background. The individual models differ only with regard to the geometrical (e.g., length and aperture), and physical properties (e.g., permeability and bulk modulus) of the secondary fractures. For comparison, we also calculate the normal compliance and the reflectivity of an isolated infinite horizontal fracture. To assess the changes of fracture compliance due to FPD, we perform a vertical compressional oscillatory test over samples of the aforementioned models that include part of the fracture system and the embedding background. This test simulates the FPD effects that a vertically propagating P-wave generates between the main and secondary fractures. Specifically, the wave produces a pressure increase in the horizontal fracture that equilibrates as fluid flows into the secondary vertical fractures. Based on this oscillatory test, we compute the average of the vertical components of strain and stress over the main fracture, which we use to estimate its normal compliance. We then proceed to calculate the PP reflectivity at normal incidence using its inferred P-wave modulus. Our results show that both the compliance and the PP reflectivity of the main fracture increase as much as two-orders of magnitude in response to the presence of secondary fractures. We also find that the physical and geometrical properties of the secondary connected fractures have an influence on the normal compliance and reflectivity of the main fracture.

How to cite: Sotelo, E., Rubino, J. G., Barbosa, N. D., Solazzi, S. G., and Holliger, K.: Seismic reflectivity of fractures: the impact of secondary connected fractures, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1059, https://doi.org/10.5194/egusphere-egu23-1059, 2023.

AbstractThe bauxite reservoir of the new type Taiyuan Formation in Zhengning area, southwest of Ordos Basin, is affected by the karst palaeogeomorphology, and its thickness varies greatly.In order to systematically study bauxite, a new type of reservoir, based on core observation, microscopic thin section, high-pressure mercury injection, low-temperature nitrogen adsorption and other experimental methods, the petrological characteristics and pore structure characteristics of bauxite reservoir were studied, which further verified the significance of reservoir exploration.The results show that: (1) the upper and lower parts of the reservoir are bauxite mudstones, and the middle part is argillaceous bauxite. The relatively developed dissolution pores are the main storage space of bauxite; (2) The bauxite minerals of Taiyuan Formation are mainly composed of aluminum minerals and clay minerals. The main minerals are diaspore, kaolinite, illite and chlorite; (3) Bauxite reservoir space is mainly composed of intragranular dissolved pores, matrix dissolved pores, intergranular dissolved pores, intergranular pores and microcracks, with the pore size mainly between 20 and 200 μm; The pore diameter of the main throat of the reservoir is 150 nm~4μm. The pore structure is good, mercury removal efficiency is high, and the overall pore throat is mainly submicron to micrometer; The average physical porosity of the reservoir is 10.6%, and the average permeability is 4.04×10-3μm. Greater than 0.3×10-3μm 36% of them are above m, and the reservoir conditions are good.The research results provide a basis for bauxite gas exploration in Ordos Basin.

How to cite: Li, H. Y.: Bauxite Reservoir Characteristics of Taiyuan Formation in Zhengning Area of Southwest Ordos Basin, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1115, https://doi.org/10.5194/egusphere-egu23-1115, 2023.

The capillary force shows great potential to improve the recovery of shale oil and gas reservoirs through spontaneous imbibition. However, the mechanism of capillary force on shale oil migration and its controlling factors are still unclear. By NMR, low-temperature nitrogen adsorption, high-pressure mercury injection and other experimental means, this work attempts to investigate the role of capillary force in improving shale oil recovery. The results show that the nuclear magnetic resonance T2 spectra obtained through spontaneous imbibition can be divided into three types, and the shale oil recovery can reach 38.72% - 65.52%, which is mainly contributed by the first peak (P1). The water imbibition and oil imbibition experiments were carried out on samples of the same size, and the dynamic wettability index of the samples with the spontaneous imbibition time was calculated. It was found that type 1 shale is mainly lipophilic, type 2 and type 3 samples are mainly hydrophilic, the P1 of three types of shale is hydrophilic to neutral, and the water imbibition volume of the three samples was greater than the oil imbibition volume. In addition, by comparing the relationship between pore throats and pores and combining the structural characteristics of samples, three typical types of pore throats are summarized. Finally, through a comprehensive study on the wettability, pore structure of shale and shale oil recovery , it is concluded that water can drive oil droplets in micropores or pore throats (P1) to enter the mesopore (P2), and then the mesopore (P2) transmits the oil to the fractures by transfering pressure difference, and the oil-water distribution pattern before and after spontaneous imbibition under the effect of capillary force is summarized to provide theoretical basis for shale oil exploration. and development.

How to cite: Yin, N.: Shale oil mobility and pore size-associated wettability under capillary pressures, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1152, https://doi.org/10.5194/egusphere-egu23-1152, 2023.

EGU23-1838 | ECS | Orals | EMRP1.6

Organic matter matters - The imaginary conductivity of sediments rich in solid organic carbon 

Cora Strobel, Manuel Dörrich, Olaf A. Cirpka, Johan A. Huisman, and Adrian Mellage

Solid organic matter (SOM) is an important component of natural sediments and plays a crucial role in providing substrate for microbial reactions and the degradation of contaminants in soil and groundwater. Knowledge about its distribution in the subsurface is crucial for the delineation of potential hotspots of microbial activity. The subsurface is, however, difficult to access, limiting our ability to reliably delineate the spatially heterogeneous distribution of SOM. Recently, the geophysical method induced polarization (IP) has been shown to be a potentially promising mapping tool, able to detect the presence of SOM. However, the mechanisms controlling IP signals in the presence of SOM are not (yet) well understood, with a handful of studies highlighting inconclusive results (Katona et al., 2021; Mellage et al., 2022; Ponziani et al., 2012; Schwartz & Furman, 2014). Moreover, a non-negligible contribution of polarization from the organic matrix can yield signals that may cause misinterpretation of other petro-physical relationships in unconsolidated sediments.

In this study, we measured the spectral IP (SIP) response of aquifer sediment cores (2 – 8 m depth) collected from an alluvial floodplain aquifer in southwest Germany. The total organic carbon (TOC) content in the cores and the cation exchange capacity (CEC) exhibit a positive correlation with the magnitude of polarization (i.e. imaginary conductivity). In addition, strong differences in the frequency dependence of the IP measurements as a function of TOC fraction were observed for the otherwise calcareous matrix devoid of other strongly polarizing mineral phases (e.g. pyrite or clay minerals). While the CEC at the site is strongly dominated by the amount of SOM, polarization is more strongly linked to SOM than CEC. We hypothesize that the weaker correlation between SOM and CEC highlights the contribution of poorly understood charge storage mechanisms within the polydisperse organic matrix that differ from polarization at mineral surfaces. Ongoing experiments with artificial soil mixtures of calcitic sand and varying fractions of peat, under controlled conditions (i.e. constant electrical conductivity of the pore fluid), will help to shed light on the controls behind our field-derived relationships. We expect that our combined field and laboratory investigations will provide insights into the petro-, or rather, organo-physical relationship between SOM and the imaginary conductivity, and thus contribute to a conceptualization of the underlying polarization mechanisms in organic matrices.

 

References

Katona, T., Gilfedder, B. S., Frei, S., Bücker, M., & Flores Orozco, A. (2021). High-resolution induced polarization imaging of biogeochemical carbon-turnover hot spots in a peatland. Biogeosciences, 18(13), 4039–4058.

Mellage, A., Zakai, G., Efrati, B., Pagel, H., & Schwartz, N. (2022). Paraquat sorption- and organic matter-induced modifications of soil spectral induced polarization (SIP) signals. Geophysical Journal International, 229(2), 1422–1433. https://doi.org/10.1093/gji/ggab531

Ponziani, M., Slob, E. C., Vanhala, H., & Ngan-Tillard, D. (2012). Influence of physical and chemical properties on the low-frequency complex conductivity of peat. Near Surface Geophysics, 10(6), 491–501. https://doi.org/10.3997/1873-0604.2011037

Schwartz, N., & Furman, A. (2014). On the spectral induced polarization signature of soil organic matter. Geophysical Journal International, 200(1), 589–595. https://doi.org/10.1093/gji/ggu410

 

How to cite: Strobel, C., Dörrich, M., Cirpka, O. A., Huisman, J. A., and Mellage, A.: Organic matter matters - The imaginary conductivity of sediments rich in solid organic carbon, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1838, https://doi.org/10.5194/egusphere-egu23-1838, 2023.

Access to top research equipment facilitates top research. However, the research equipment needed may not always be available within individual institutes, while access to external facilities may not in all cases be affordable. This restricts the research that any individual can do and hampers scientific breakthroughs, particularly across disciplines. To overcome this limitation, a collaborative infrastructure network was initiated: EPOS-NL (European Plate Observing System- Netherlands). EPOS-NL provides free-of-charge access to geophysical labs at Utrecht University and Delft University of Technology, both in the Netherlands, for research within rock physics, analogue modelling of tectonic processes, X-ray tomography and microscopy. These labs include capabilities for among others: A) Mechanical and transport testing at crustal stress, temperature and chemistry conditions; B) Analogue tectonic modelling, including dynamic model imaging in 2D and 3D; C) X-ray tomography at sub-µm resolution; and D) A correlative workflow for imaging and microchemical mapping, down to nm resolution. As such, these labs can provide you with the means and expertise for your research into the physical behavior of the Earth’s crust and upper mantle.

Access to EPOS-NL can be requested by applying to a bi-annual call, posted on www.EPOS-NL.nl. This involves submitting a short (1-2 page) research proposal. Research proposals are reviewed on the basis of feasibility and excellence, but generally have a high chance of success (~80% in previous rounds). Interested? Have a look on the EPOS-NL website – and apply!

How to cite: Wessels, R. and Pijnenburg, R.: Access for free: How to get free-of-charge access to Dutch Earth scientific research labs through EPOS-NL, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2834, https://doi.org/10.5194/egusphere-egu23-2834, 2023.

EGU23-3097 | ECS | Orals | EMRP1.6

Petroacoustic characterization of fractured and weathered limestone from the O-ZNS Critical Zone Observatory 

Abdoul Nasser Yacouba, Céline Mallet, Jacques Deparis, Phlippe Leroy, Gautier Laurent, Mohamed Azaoural, and Damien Jougnot

In a context of energy transition and water resources crisis, studying the fluid flow in the critical zone appears to be a major issue. The O-ZNS (Observatory of transfers in the Vadose Zone, Orleans, France) site has been designed for the development of innovative tools that can characterize and monitor the dynamics of the vadose zone (VZ). The geological structure of this VZ is composed mainly by a lacustrine limestone formation located between 10 and 20 m-deep, characterized by multiscale heterogeneities (facies variations, presence of cracks, fractures, pores, cavities and karstification). In order to predict fluid flow, heat transfer, and aquifer recharge through this VZ, the limestone heterogeneities have to be integrated into geological concepts and numerical models.

This study is a key part of the O-ZNS project, as it aims at (i) understanding and classifying the microstructural and petrophysical properties at laboratory scale; (ii) predicting these properties through quantitative geophysical parameters and; (iii) developing new geophysical interpretations through coupled approaches.

From well logs analysis of O-ZNS site, we collected limestone samples from four main facies (with four samples per facies). We performed a state-of-the-art petrophysical characterization including connected and total porosity, density, and permeability measurements. Then we carried out acoustic measurements on dry and water-saturated plugs (2.5 and 4 cm diameter) with P- and S-waves at two frequencies 0.5 and 1 MHz.

The measurement results show a large dispersion of the petrophysical properties. For example, connected porosity ranges from 4 to 12 %, and density from 2,3 to 2,5 g/cm3. This dispersion of petrophysical properties is interpreted in terms of heterogeneity of the type of porosity (micro to cm pore size, presence of cracks and fracture) and mineralogy. However, it appears that the deepest facies (located at the aquifer level) is more homogenous and shows the highest porosity. This is consistent with directly observed (micro)structure from 3D sample and well scans.

Acoustic velocity results show coherent values for fractured limestone rocks. The different facies show dispersion, such as Vp varying from 4950 to 5600 m/s for the shallowest facies at 9 m-deep. Here also, the deepest facies appears to be the most homogeneous with the lowest velocities (around 4875 m/s). Thus, velocities are consistent with the petrophysical measurements and one can draw a simple relationship between the porosity, density and acoustic velocities. However, other petroacoustic relationships are necessary to better discriminate between each facies and therefore predict their microstructure and transport properties.

The following step of this work is to add electric measurements and develop petro-acoustico-electrical models and enhance our capacity to upscale these properties from the laboratory to the field.

How to cite: Yacouba, A. N., Mallet, C., Deparis, J., Leroy, P., Laurent, G., Azaoural, M., and Jougnot, D.: Petroacoustic characterization of fractured and weathered limestone from the O-ZNS Critical Zone Observatory, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3097, https://doi.org/10.5194/egusphere-egu23-3097, 2023.

EGU23-3347 | Posters on site | EMRP1.6

Estimation of normal and shear compliance for inclined fractures from full-waveform sonic log data 

Zhenya Zhou, Eva Caspari, Nicolás D Barbosa, Marco Favino, and Klaus Holliger

Fractures are ubiquitous throughout the Earth’s upper crust and represent localized zones of mechanical weakness as well as preferential pathways for fluid flow. Correspondingly, their detection and characterization is vital for a wide range of pertinent applications in geological, civil, and environmental engineering, hydrocarbon exploration, nuclear waste and carbon dioxide storage, as well as geothermal energy production. Particularly important mechanical characteristics of fractures are their normal and shear compliances, which relate the displacement perpendicular and parallel to the fracture plane, respectively, to the corresponding components of the prevailing stress tensor. Based on the linear slip model, previous works developed a phase delay method to estimate the normal compliance of individual fractures using the P-wave first-arrivals in full-waveform sonic (FWS) log data. This approach is viable for a quasi-normal incidence scenario of the sonic wavefield. However, the conditions under which this technique remains valid at oblique P- and S-wave incidence angles as well as the role played by the combined effects of the normal and shear compliances remains enigmatic. To alleviate this problem, we have extended the phase delay technique to allow for non-normally-incident P- and S-waves. In addition to improving the accuracy of the normal compliance estimates with respect to the results computed under a normal incidence assumption, this method allows for a simultaneous estimation of the normal and shear compliances. The proposed approach has been validated through analytical tests and numerical simulations of wave propagation in a hard-rock-type borehole environment intersected by a single fracture with dip angles of 0, 30, and 40 degrees with regard to the horizontal. For fracture compliance values typical of mesoscale fractures (10-14 to 10-12 m/Pa), the effects associated with oblique incidence become significant for dip angles larger than 50 and 30 degrees for P- and S-waves, respectively. However, our results also demonstrate that the normal incidence assumption can produce similar errors at even lower fracture dip angles in the presence of larger fracture compliance values and/or shear-to-normal compliance ratios. Finally, we apply the method to observed FWS data acquired in granitic rocks where the considered boreholes intersect fractures at a range of oblique angles. Direct in-situ estimates of compliances for discrete individual fractures are scarce, but essential to bridge the scale gap between laboratory estimates and input data for reservoir scale models. While recent studies show the feasibility of estimating normal compliances from FWS data, this study aims to explore whether and to what extent this approach can be practically extended to shear compliances and to the corresponding shear-to-normal compliance ratios.

How to cite: Zhou, Z., Caspari, E., Barbosa, N. D., Favino, M., and Holliger, K.: Estimation of normal and shear compliance for inclined fractures from full-waveform sonic log data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3347, https://doi.org/10.5194/egusphere-egu23-3347, 2023.

EGU23-3376 | Orals | EMRP1.6

The heterogeneous near-surface velocity structure of carbonate-hosted seismogenic fault zones investigated at different length scales: from ultrasonic measurements to subsurface seismic tomography 

Michele Fondriest, Maurizio Vassallo, Stéphane Garambois, Thomas M. Mitchell, Di Giulio Giuseppe, Mai-Linh Doan, and Christophe Voisin

Field geological studies have revealed the heterogeneous structure of fault zones down to the sub-metric scale due to the juxtaposition of rocks presenting distinct deformation intensity and physical-transport properties. However, such internal variability is not generally resolved by most seismic tomography techniques due to spatial resolution limits. Quantifying the heterogeneous internal structure of fault zones is fundamental to understand their mechanical and hydrological characteristics. In this sense, determining seismic wave velocities and related physical properties (elastic moduli, porosity and fracture intensity) within fault zones, at different observational scales, is crucial.

Here, the near-surface velocity structure of two active seismogenic fault zones located in the Central Apennines of Italy was quantified at different length scales, from laboratory measurements of ultrasonic velocities (rock samples of few centimeters, 1 MHz source) to high-resolution first-arrival seismic tomography (spatial resolution of few meters). Detailed structural mapping was conducted within the Vado di Corno and Monte Marine fault zones, two NW-SE trending structures with length of ~ 15 km and up to 1.5 km of extensional displacement. Distinct structural units separated by fault strands were recognized in the fault zone footwall blocks cutting Mesozoic dolomitic carbonates: (i) fault core cataclastic units, (ii) breccia unit, (iii) high-strain damage zone, (iv) low-strain damage zone. The single units were systematically sampled along transects orthogonal to the average strike of the faults and characterized in the laboratory in terms of directional P and S ultrasonic wave velocities, porosity and microstructures. The fault core cataclastic units were significantly “slower” (VP = 4.5±0.4 kms-1, VS = 2.7±0.2 kms-1) compared to the damage zone units (VP = 5.6±0.6 kms-1, VS = 3.2±0.3 kms-1) at short length scales (i.e. few centimeters). A general negative correlation between ultrasonic velocity and porosity was observed, with some variability within the fault core mostly related to the textural maturity (clast/matrix volume ratio) of the fault rocks and the degree of pore space sealing by calcite cements.

Multiple P- and S-wave high-resolution seismic profiles (length 90-116 m, geophone spacing 1-1.5 m) were acquired across the two fault zones at different structural sites, moving from the principal fault surface into the outer damage zone. The derived first-arrival tomography models highlighted fault-bounded rock bodies with distinct velocities and characterized by geometries which well compared with those deduced from the structural mapping. At the larger length scale investigated by the active seismic survey, relatively “fast” fault core units (VP ≤ 3.0 kms-1, VS ≤ 1.8 kms-1) and very “slow” high-strain damage zones (VP < 1.6 kms-1, VS < 1 kms-1) were recognized. These velocity ranges were significantly different from those determined in the laboratory on small samples. This apparent discrepancy could be reconciled using an effective medium approach, considering the effect of mesoscale fractures density and size distributions affecting each structural unit.

This combined study highlighted the high petrophysical variability of carbonate-hosted fault zones, with structural units characterized by sharp contacts and different velocity scaling. In particular, the persistence of compliant high-strain damage zones at shallow depth might strongly affect near-surface deformation.

How to cite: Fondriest, M., Vassallo, M., Garambois, S., Mitchell, T. M., Giuseppe, D. G., Doan, M.-L., and Voisin, C.: The heterogeneous near-surface velocity structure of carbonate-hosted seismogenic fault zones investigated at different length scales: from ultrasonic measurements to subsurface seismic tomography, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3376, https://doi.org/10.5194/egusphere-egu23-3376, 2023.

EGU23-3630 | ECS | Posters on site | EMRP1.6

Fluid diffusion and pore-pressure distribution in microcracked rocks 

gang lin, Samuel Chapman, Jérôme Fortin, and Alexandre Schubnel

Pore pressure has a major influence on the effective stress and thus on the mechanical behaviour and the physical (elastic and transport) properties of microcracked rocks. In the field, in-situ measurements of pore-pressure is difficult outside of local measurements around boreholes. Yet, fluid migration is observed ubiquitously in the continental crust, whether in fault zones or in volcanic geothermal areas. In particular, pore pressure perturbations change the effective stress, which may lead to microseismic activity. This may also occur in conventional reservoirs, the storage of CO2 or deep geothermal energy extraction.

 

In this study, we focus -in the laboratory- on the hydro-mechanical behavior of thermally treated Westerly granite and naturally microcracked Etna basalt samples (40 mm in diameter and 80 mm in length). The goal is to determine the pore pressure distribution and diffusion laws under different pore pressure gradients. First, classical (constant flow method) permeability measurements under small pore pressure gradient (1 MPa over the length of the sample) were carried out as a function of increasing confining pressures Pc (up to 70 MPa). The results show that permeability of samples varies exponentially with effective pressure, which is expected for cracks-porous rocks. The pressure sensitivity factor for permeability was then deduced to be of the order of 0.011~0.057 MPa-1.

 

In a second step, permeability was measured at high (70 MPa) confining pressure, under large pore-pressure gradients (up to 60 MPa). During this part of the experiments, pore pressure was measured along the sample using newly developed fluid pressure sensors (with an absolute accuracy of +/-1MPa). Under small pore pressure gradient (2.5 MPa), our results show that the pore pressure varies linearly over the length of the sample, as expected from Darcy’s law and a constant permeability. However, with increasing pore pressure gradient (up to 60 MPa), the linearity is lost, as the permeability can no longer be assumed constant along the sample.

 

To interpret our results, we solved the diffusion equation, assuming that permeability varies exponentially with effective pressure. For steady state flow conditions, our observations of the pore pressure distribution in the samples are consistent with the theoretical predictions. In particular, we show that the shape of the pore-pressure distribution at steady-sate does not depend on permeability itself, but rather on the permeability pressure sensitivity factor: the larger the latter, the more non-linear the pore pressure in the samples.

How to cite: lin, G., Chapman, S., Fortin, J., and Schubnel, A.: Fluid diffusion and pore-pressure distribution in microcracked rocks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3630, https://doi.org/10.5194/egusphere-egu23-3630, 2023.

EGU23-4452 | Posters on site | EMRP1.6

IP signature of metallic particles: lessons learnt from field and laboratory experiments 

Pauline Kessouri, Clémence Ryckebusch, Alejandro Fernandez-Visentini, and Lee D. Slater

Past metallurgical sites and deposits account for a significant proportion of potentially contaminated sites in the European Union (EU):  about 100,000 have been identified only in the North West regions of the EU. While recent wastes from sites still in operation are commonly recovered, this is not the case for old aggregated materials with a high content of ferrous (and other) metals, white and black slag, etc., which are considered to be sources of pollution and are costly to manage or dispose of. These sites could be considered as opportunities to recover large volumes of resources (metals, materials and land) using urban mining techniques if they were better characterized.

The induced polarization (IP) method is a geophysical method known to be sensitive to the presence of various metallic particles disseminated in the soil layers. If qualitative interpretation of the measured IP parameters in the field (i.e. resistivity and chargeability) are widespread, quantitative interpretation in terms of concentrations of different metallic particles is yet to be developed.

The example of the Pompey field site (FR), investigated as part of the NWE-REGENERATIS project (https://www.nweurope.eu/projects/project-search/nwe-regeneratis-regeneration-of-past-metallurgical-sites-and-deposits-through-innovative-circularity-for-raw-materials/), is used in this study to present the interest in using time domain IP (TDIP) field measurements to characterize metallurgical past deposits. Several paths are explored to convert resistivity and chargeability TDIP tomographies into quantitative interpretation of metallic element concentrations: (1) extraction of frequency data from TDIP field measurements; and (2) upscaling of lab results through numerical simulations.

Regarding (1), TDIP measurement were made with different time windows (different frequencies), giving us access to spectral IP (SIP) processing and interpretation at 5 frequencies. These new frequency interpretations of the TDIP can be compared to lab measurements and facilitate the upscaling of the found petrophysical relationships.

Regarding (2), in order to interpret the TDIP results in terms of concentration of metallic particles, known petrophysical relationships and geochemical measurements obtained at the lab scale need to be interpreted at the field scale. We propose to use a Bayesian framework for inferring field-scale metallic particles concentrations, taking into account heterogeneity and anisotropy within the inversion schemes. This work is ongoing.

For both (1) and (2), it is crucial to find the best petrophysical relationships linking the IP parameters to concentration and size of metallic particles. Wong (1979) developed a physics-based electrochemical model that is still used today. We further investigate the Wong model to explore the role of the background porous medium itself in determining the IP signature of disseminated metallic particles and discuss the sensitivity of the model to estimate metallic grains concentration.

All these different research paths lead to a better understanding of metallic particles IP signature at a small scale, as well as discussions on how to use these findings to better characterize and reevaluate past metallurgical sites and deposits.

This study was funded by the North West Europe (NWE) Interreg project called NWE-REGENERATIS that aims at the regeneration of past metallurgic sites and deposits through innovative circularity for raw materials, and by Schlumberger-Doll Research Center (USA, MA).

How to cite: Kessouri, P., Ryckebusch, C., Fernandez-Visentini, A., and Slater, L. D.: IP signature of metallic particles: lessons learnt from field and laboratory experiments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4452, https://doi.org/10.5194/egusphere-egu23-4452, 2023.

EGU23-5997 | ECS | Posters on site | EMRP1.6

Quantification of electrical properties of deep crustal rocks based on their mineral modal proportion, fabric, and pressure-temperature conditions 

Hadiseh Mansouri, Virginia Toy, Kevin Klimm, Nikolai Bagdassarov, Mattia Pistone, Andrew Greenwood, and György Hetényi

Electrical resistivity tomography and electromagnetic inverse modelling are particularly useful to explore orogenic systems because the most important conductive components of rock masses are economically-significant minerals (semi-metals like graphite, and semi-conducting minerals like sulphides), as well as certain clays and permeating saline fluids. Despite the efficiency of electrical measurements, anisotropic properties of the crust, which affect almost all acquired data, may lead to serious misinterpretation of the subsurface geology if they are ignored during data analysis. Understanding the geological causes of electrical anisotropy and heterogeneity, and considering their influence in field-scale electrical measurements, can provide crucial information on the crustal architecture, pore fluid network, as well as revealing the internal structure of fault zones, and increasing the accuracy of location of critical mineral deposits. To this end, we aim to quantify the electrical properties of mid- to lower-crustal metamorphic and magmatic lithologies based on their micro- to macrostructures, conductive components and fluid contents as measured by laboratory methods. Our research also contributes to, and advances, the likely outcomes of the ICDP-supported project DIVE (Drilling the Ivrea-Verbano ZonE). DIVE is currently exploring the hidden portions of the continental lower crust and crust-to-mantle transition zone of the Ivrea-Verbano Zone (Western Alps, Italy) in two boreholes at the sites of Megolo (DT-1a) and Ornavasso (DT-1b), separated by  7 km distance in Val d’Ossola. The first DIVE borehole, DT-1b, was completed in December 2022, reaching a depth of 578.5 metres, and rock cores of metapelite, gneiss, amphibolite, migmatite, and pegmatite were recovered. Some drillcores contained a range of potentially conductive lithologies, including sulphide- and graphite-bearing metapelites. In this research we are measuring electrical conductivity on a representative benchmark suite of bedrock outcrop samples from the region around the DIVE boreholes at elevated pressure and temperature. We are currently characterising the microstructural arrangement and distribution of conductive phases within these samples by electron beam methods. To properly understand electrical property measurements of the natural samples we determine the contributions of each key conductive phase (graphite and sulphides). The bulk resistivity of a mixture of quartz+10% graphite, which was synthesized in a solid–medium piston-cylinder apparatus, at temperature of 22.5 °C and pressure of 0.5 GPa, was found to be 1 Ω.m. No change in bulk resistivity was observed with increasing temperature up to 1000 °C. We will present the results of additional tests to be undertaken between January and April 2023 at this conference. Our data will be employed in interpretation of wireline electrical logs and borehole-to-surface electrical surveys from DT-1a and DT-1b.

How to cite: Mansouri, H., Toy, V., Klimm, K., Bagdassarov, N., Pistone, M., Greenwood, A., and Hetényi, G.: Quantification of electrical properties of deep crustal rocks based on their mineral modal proportion, fabric, and pressure-temperature conditions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5997, https://doi.org/10.5194/egusphere-egu23-5997, 2023.

EGU23-6640 | Orals | EMRP1.6

A petrophysical model for the spectral induced polarization of clays 

Philippe Leroy, Alexis Maineult, Aida Mendieta, and Damien Jougnot

Clays are sedimentary minerals that are ubiquitous in the Earth’s continental crust. They have remarkable adsorption, catalytic and containment properties due to their high surface charge and very large specific surface area. However, their microstructural and electrochemical properties are not completely understood. In this study, we have developed a new petrophysical model to interpret laboratory spectral induced polarization measurements on kaolinite, illite and montmorillonite muds when salinity increases (from around 0.01 mol L-1 to 1 mol L-1 NaCl initially). Our model considers electrical conduction in the bulk and diffuse layer waters as well as polarization of the Stern layers of illite aggregates and Stern layers and interlayer spaces of Na-montmorillonite aggregates with different shapes and sizes. Maxwell-Wagner polarization was considered as well. By fitting predicted to measured SIP spectra, we found that the basal surface of clays controls Stern layer polarization and that the interlayer space of Na-montmorillonite may polarize in the mHz to kHz frequency range. Our study is a step forward to better understand the high surface conductivity response of clays inferred from resistivity and induced polarization measurements.

How to cite: Leroy, P., Maineult, A., Mendieta, A., and Jougnot, D.: A petrophysical model for the spectral induced polarization of clays, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6640, https://doi.org/10.5194/egusphere-egu23-6640, 2023.

EGU23-6657 | Orals | EMRP1.6

Predicting transport properties in porous and fractured media, how fractal-based models can help petrophysicists? 

Damien Jougnot, Luis Guarracino, Mariangeles Soldi, Flore Rembert, Haoliang Luo, Santiago Solazzi, and Luong Duy Thanh

Since the great paradigmatic revolution initiated by Mandelbrot, we know that fractals are ubiquitous in nature. From coastlines to plant growth, fractal mathematics help us to describe and quantify many of nature’s properties. In the same way, the fractal theory can be applied to porous and fractured media. In recent decades, numerous research studies have shown that fractal theory provides a solid framework to describe the properties of geological media. Based on advanced physical knowledge at the microscale, it is possible to use fractal patterns to describe transport properties in porous and fractured media. Fractal laws can be applied to describe the size distribution of pores and fractures, fracture widths, and pore irregularities, but also to relate these pore sizes to pore tortuosities. In this contribution, we review the significant advances that have been made in the field of petrophysics by applying fractal mathematics to describe fundamental petrophysical properties such as porosity, permeability, electrical conductivity, thermal conductivity, and electrokinetic and electroosmotic coupling coefficients. These new petrophysical models are based on the upscaling procedure applied to different fractal objects such as the Sierpinski carpet, Koch curves, Pigeon holes, and Menger sponge, among others. Among the interesting results obtained by means of fractal-based petrophysics, one can derive transport properties of saturated or partially saturated media, above and below freezing temperature, and considering hysteretic behavior and reactive media dissolution/precipitation processes. Integrating these fractal-based petrophysical relationships into the laboratory or field-scale, numerical simulations are now opening a wide range of potential avenues for progress in near-surface and reservoir geophysics.

How to cite: Jougnot, D., Guarracino, L., Soldi, M., Rembert, F., Luo, H., Solazzi, S., and Thanh, L. D.: Predicting transport properties in porous and fractured media, how fractal-based models can help petrophysicists?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6657, https://doi.org/10.5194/egusphere-egu23-6657, 2023.

EGU23-6941 | ECS | Posters on site | EMRP1.6

Reviewing numerical simulation methods of nuclear magnetic resonance signals in porous media. 

Francisca Soto Bravo, Chi Zhang, and Lin Jia

Low-field nuclear magnetic resonance (NMR) is a minimally-invasive geophysical method often used to characterize pore spaces, water content, and fluid transport and distribution in geologic materials. NMR measurements are based on the magnetization and relaxation behavior of the spin magnetic moment of hydrogen atoms in external magnetic fields. These measurements can be taken in the field, such as from a borehole or the surface of the Earth, or in the laboratory using a bench-top apparatus. Numerical simulations of NMR signals are great tools to better understand the relaxation behavior of pore water under different scenarios, explore the effect of changes in the composition or geochemical characteristics of the geologic material, verify experimental findings, and improve the interpretation of field measurements. They can also be used to examine situations where traditional interpretation of NMR signals fails, such as in complex, heterogeneous geometries with pore coupling effects. In a pore coupled system, significant magnetization exchange between pores of different sizes occurs during the measurement time, which makes it difficult to independently characterize the pore environments. Using numerical simulations, it is possible to explore the factors that control pore coupling, such as surface relaxivity, pore-network connectivity and other pore-network characteristics, can be explored independently in a controlled setting. In this work, we introduce common numerical modelling approaches used for simulating NMR responses in geologic materials, along with their limitations and traditional workflows. We present two specific examples: a Random Walk (RW) simulation to test the effect of different pore-network connectivity features on pore coupling in a simplified pore geometry, and a Finite Element Method (FEM) simulation approach to visualize the distribution of magnetization density within a single pore. NMR is a promising hydrogeophysics tool gaining popularity and finding new applications for near-surface exploration. A better understanding of the NMR signals in diverse and complex scenarios is essential for the adequate design of experiments and field campaigns and for the correct interpretation of NMR measurements at different scales. The use of numerical modelling strategies can help improve this understanding, leading to more accurate and reliable measurements and interpretations.

How to cite: Soto Bravo, F., Zhang, C., and Jia, L.: Reviewing numerical simulation methods of nuclear magnetic resonance signals in porous media., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6941, https://doi.org/10.5194/egusphere-egu23-6941, 2023.

Interactions between mineral phases and fluids in the subsurface inevitably lead to mineral precipitation reactions and dissolution. While these processes are the major drive behind many geochemical changes in aquifer systems, their detection, monitoring and characterization is difficult. Geoelectrical methods provide potential to investigate precipitation and dissolution reactions in rocks non-invasively. However, with the measurement of the DC electrical conductivity alone, changes in pore water salinity, mineralogy, or pore space characteristics can hardly be differentiated. The ambiguity in the identification of these processes can be reduced by also measuring the spectral induced polarization (SIP) response, i.e., the frequency-dependent complex electrical conductivity, given the sensitivity of especially the imaginary component to textural and chemical characteristics. In order to assess the capability of this approach, we conducted multiple laboratory experiments on quartz-rich sandstone samples in which different precipitation scenarios were provoked under controlled conditions while monitored with SIP. The experimental setup consists of two reactant solutions in contact with both sides of the sample, leading to a reaction within the sample as diffusion from each side into the rock goes on. We used reactant solutions of NaHCO3and CaCl2in varying molality, the mixing of which in the sample’s pore space results in CaCO3formation. By varying samples and solutions, three different components contributing to the complex conductivity response during the ongoing precipitation could be identified. The onset of the chemical reaction is clearly visible in the temporal evolution of imaginary conductivity at relatively low frequencies. The observed temporal peak can be associated with changes in the pH value due to the infiltration of the reactant at earlier times and the reduction in pH with calcite precipitation. This explanation is supported by additional experiments performed on a similar sample, where pH was altered by infiltration of NaHCO3only. A second spectral high-frequency peak shows up at later stages of the experiments, suggesting that here the main changes of the pore surfaces in response to the precipitation are occurring. This phenomenon could not be recreated by using the infiltration of a pure electrolyte solution or the infiltration of NaHCO3. The last component in the complex conductivity response is the continuous increase of the real component due to the increasing salinity of the pore water, which also could be reproduced in comparative measurements. Our results show the potential of complex conductivity measurements for precipitation monitoring in rocks, including improved textural and chemical characterization. Given the applicability of complex conductivity imaging at the field scale, the method thus holds promise for monitoring tasks in the context of, for example, carbon capture and storage, enhanced geothermal energy, soil stabilization, and capture of dissolved contaminants, which are of increasing societal relevance.

How to cite: Mansfeld, A. M. and Kemna, A.: Impact of chemical subprocesses during calcite precipitation in sandstones on the measured SIP response and their identification, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7902, https://doi.org/10.5194/egusphere-egu23-7902, 2023.

EGU23-8629 | Posters virtual | EMRP1.6

Oil saturation quantitative evaluation in lacustrine shale: A novel insight from NMR T1-T2 experiments 

Shaolong Zhang, Jingong Cai, Jianping Yan, Xiaojun Zhu, and Min Wang

Oil saturation is important for shale reservior to identify favorable sections and mapping the geological sweet spot. Current oil saturation evaluation methods, including experiments and empirical formulas, are not suitable for shale reservoir because of the complex mineral, fluid components and pore structure characteristics. To establish the shale oil saturation calculation model, X-ray diffraction, one-dimension and two-dimension nuclear magnetic resonance (NMR), and oil-water two-phase displacement experiments were employed on shale samples collected in the upper sub-member of the fourth member of the Eocene Shahejie Formation in the Dongying sag, Jiyang Depression, Bohai Bay Basin. After data analysis, the reason for whether oil is produced in the displacement experiments were explained, distribution characteristics of different shale components in the NMR T1-T2 map were analyzed, and a new shale oil saturation calculation method was proposed using NMR T2 sensitive parameters that reflected the changes of NMR T2 spectrum morphological characteristics with different oil saturation calibrated by NMR T1-T2 map at different displacement stage. The results indicated that the pore structure of shale samples is complex and show strong heterogeneity according to the NMR T2 spectrum, and the distribution of shale pore size is the main factor determining whether there is oil in the volumetric cylinder in the displacement experiment under the premise of the slight difference of wettability. NMR T1-T2 map is an effective way to identify different components (kerogen and solid bitumen, adsorbed oil, free oil, structural and adsorbed water, free water) of shale samples, and usually, kerogen and solid bitumen distributed in the top left of the T1-T2 map with T1>10 ms, T2<0.1 ms. Based on this, T2 threshold for free oil and adsorbed oil are 2 and 0.2 ms, and the corresponding threshold of pore radius are 40 and 4 nm according to the NMR theory. As NMR T2 spectrum sensitive parameters, geometric mean and interval porosity corresponding to the first peak are positively and negatively correlated with oil saturation respectively. With understanding this, oil saturation calculation method is established using the above two parameters and the Root Mean Square Error (RESM) between the measure oil saturation and the calculated results is 5.78%, which reflecting the accuracy and validity of the method. In general, this method allows the shale oil saturation to be accurately calculated and provides a parameter basis for the determination of favorable sections and evaluation of resource of shale oil reservoir. Moreover, it also offers a new idea for the oil saturation predication by NMR logging.

How to cite: Zhang, S., Cai, J., Yan, J., Zhu, X., and Wang, M.: Oil saturation quantitative evaluation in lacustrine shale: A novel insight from NMR T1-T2 experiments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8629, https://doi.org/10.5194/egusphere-egu23-8629, 2023.

EGU23-8860 | ECS | Orals | EMRP1.6

Effective seismic properties of fractured rocks: the role played by fracture scaling characteristics 

Gabriel Quiroga, Santiago Solazzi, Nicolás Barbosa, J. Germán Rubino, Marco Favino, and Klaus Holliger

The seismic characterization of fractured geological formations is of importance for a wide range of applications throughout the Earth, environmental and engineering sciences, such as, for example, hydrocarbon exploration and production, CO­­2 sequestration, monitoring of enhanced geothermal reservoirs, nuclear waste storage, and tunneling operations. Seismic methods are indirect in nature, and, hence, comprehensive modelling techniques are required to translate corresponding observations into rock physical properties. In this regard, numerous works have employed the theoretical framework of poroelasticity in order to explore the seismic response of particularly complex and elusive parameters of fluid-saturated fracture networks, such as their fracture density and interconnectivity. This is motivated by the fact that poroelasticity allows to account for fluid pressure diffusion effects between connected fractures as well as between fractures and their embedding background. Fluid pressure diffusion prevails when zones of contrasting compliance are traversed by a seismic wave, as this results in pressure gradients, which induce oscillatory fluid flow and, consequently, energy dissipation. This form of energy dissipation has a significant impact on seismic velocity dispersion, attenuation, and anisotropic characteristics, which are key seismic observables. While a wide range of approximations are employed to represent fracture properties in order to compute the seismic response of formations, they do tend to inherently ignore the complex interrelationships between the lengths, compliances, apertures, and permeabilities of fractures remains, as of yet, unaccounted for. In this work, we seek to alleviate this in combination with a poroelastic modelling approach to explore how length-dependent fracture scaling characteristics affect the effective seismic properties of fractured rocks. We start by revisiting canonical models with two orthogonally intersecting fractures of different lengths to analyze the interactions occurring when fractures are affected by a seismic wavefield. We then proceed to explore how scaling relations affect these results. Finally, we consider fracture networks with realistic stochastic length distributions, for which we compare the effective seismic response with and without the proposed length-dependent scaling of the fracture characteristics. Our results demonstrate that the scaling of fracture properties does indeed have a significant effect on the seismic response, as it dramatically reduces the contribution of smaller fractures to fluid pressure diffusion between connected fractures, which, in turn, affects the overall seismic characteristics of the formation.

How to cite: Quiroga, G., Solazzi, S., Barbosa, N., Rubino, J. G., Favino, M., and Holliger, K.: Effective seismic properties of fractured rocks: the role played by fracture scaling characteristics, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8860, https://doi.org/10.5194/egusphere-egu23-8860, 2023.

EGU23-11037 | Orals | EMRP1.6

Mechanical Earth Modelling for Petroleum Reservoir in Western Offshore India: Tensile Failure Study 

Sarada Prasad Pradhan and Krishna Chandra Sundli

Quantifying in-situ stress is crucial for predicting drilling-induced tensile fractures, wellbore failures, proper well placement, hydro-fracture treatment optimization and sand production. A comprehensive mechanical earth model incorporating pore pressure, stress state, and rock mechanical properties enable us to study the cause of failure observed in the well. The study is focused on a petroleum reservoir in Western Offshore India. In this study, an attempt is made to estimate in-situ stresses present in the field. Well-log data calibrated with available direct pressure measurements viz. Modular Dynamic Test (MDT) and Leak Off Test (LOT) data are used to predict the pore pressure and minimum horizontal stress. Vertical stress is estimated by extrapolating the density log; for minimum and maximum horizontal stress, the poroelastic approach is adopted. Key rock strength parameters were estimated using standard correlations and regional studies. Wellbore stability analysis was carried out, and the results were calibrated with the actual mud weight used. Natural fractures present in the reservoir are sensitive to stress distribution which in turn is sensitive to changes in pore pressure distribution. Many exploratory and development wells have been drilled in the area, but very few have recorded DSI (Dipole Shear Sonic Image) and FMI (Formation Micro-Scanner Image) logs. With the available log data, the study was carried out to quantify the rock mechanical parameters and the stress magnitudes of the field. The study aims to model the study area's geomechanical aspect for better prediction of drilling-induced challenges, thereby reducing NPT and optimizing drainage.

How to cite: Pradhan, S. P. and Sundli, K. C.: Mechanical Earth Modelling for Petroleum Reservoir in Western Offshore India: Tensile Failure Study, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11037, https://doi.org/10.5194/egusphere-egu23-11037, 2023.

EGU23-16816 | ECS | Posters on site | EMRP1.6

Correlating Seismic Wave Velocities with the Physicomechanical Properties of Carbonate Rocks 

Anamika Sahu, Sandeep Singh, Narendra Kumar Samadhiya, and Anand Joshi

Seismic wave velocities (both P and S waves) measurements have been carried out for carbonate rock samples collected from Lesser Himalayan deposits exposed along Alaknanda valley between Rudraprayag to Helang village in Uttarakhand, India. This study has been carried out to evaluate the effect of the petrophysical and the mechanical properties of rocks on seismic wave velocities. On the core samples, petrophysical and mechanical measurements were performed where porosity, density, water absorption, and seismic wave velocities were first determined, followed by measuring the uniaxial compressive strength (UCS), and Brazilian tensile strength (BTS). Thin sections were prepared to measure the petrographic parameters (textural properties and mineralogical composition). This study focuses mainly on grain size and mineral composition. Petrographic investigation and X-ray diffraction (XRD) analysis were done to identify their mineralogy. Both petrographic and XRD analysis revealed that the main constituting minerals are dolomite, and in minor amounts, calcite, quartz, and opaque minerals are present. Interrelationships between seismic wave velocities and porosity, density, mineral constituents, grain size, uniaxial compressive strength, and Brazilian tensile strength were obtained using regression analysis. It has been concluded that there are significant positive correlations between compressional wave velocity and uniaxial compressive strength (r2 = 0.82), Brazilian tensile strength (r2 = 0.67). Similarly, strong to moderate correlations were found between shear wave velocity and uniaxial compressive strength (r2 = 0.73), Brazilian tensile strength (r2 = 0.68). Weak to moderate negative correlations were found between seismic wave velocities and porosity. Moderate positive correlations have been found between seismic wave velocities and dry density. There is moderate negative correlation has been found between uniaxial compressive strength and grain size. Furthermore, it has also been concluded that the influence of grain size on rock strength was more important than mineral content.

How to cite: Sahu, A., Singh, S., Kumar Samadhiya, N., and Joshi, A.: Correlating Seismic Wave Velocities with the Physicomechanical Properties of Carbonate Rocks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16816, https://doi.org/10.5194/egusphere-egu23-16816, 2023.

EGU23-1562 | ECS | Posters virtual | GI2.1

A new finite-difference stress modeling method governed by elastic wave equations 

Zhuo Fan, Fei Cheng, and Jiangping Liu

Numerical stress or strain modeling has been a focused subject in many fields, especially in assessing the stability of key engineering structures and better understanding in local or tectonic stress patters and seismicity. Here we proposed a new stress modeling method governed by elastic wave equations using finite-difference scheme. Based on the modeling scheme of wave propagation, the proposed method is able to solve both the dynamic stress evolution and the static stress state of equilibrium by introducing an artificial damping factor to the particle velocity. We validate the proposed method in three geophysical benchmarks: (a) a layered earth model under gravitational load, (b) a rock mass model under nonuniform loads on its exterior boundaries and (c) a fault zone with strain localization driven by regional tectonic loading that measured by GPS velocity field.  Because the governing equations of the proposed method are wave equations instead of equilibrium equations, we are able to use the perfectly matched layer as the artificial boundary conditions for models in unbounded domain, which will substantially improve the accuracy of them. Also, the proposed scheme maps the physical model on simple computational grids and therefore is more memory efficient for grid points’ positions not been stored. Besides, the efficient parallel computing of the finite-different method guarantees the proposed method’s advantage in computational speed. As a minor modification to wave modeling scheme, the proposed stress modeling method is not only accurate for geological models through different scales, but also physically reasonable and easy to implement for geophysicists.

How to cite: Fan, Z., Cheng, F., and Liu, J.: A new finite-difference stress modeling method governed by elastic wave equations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1562, https://doi.org/10.5194/egusphere-egu23-1562, 2023.

EGU23-2228 | ECS | Posters on site | GI2.1

Non-destructive geophysical damage analysis of medieval plaster in the cloister of the St. Petri Cathedral Schleswig (Germany) 

Yunus Esel, Ercan Erkul, Detlef Schulte-Kortnack, Christian Leonhardt, Julika Heller, and Thomas Meier

Buildings that have existed for centuries undergo structural changes over time due to variations in use. In addition, many structures are severely damaged for example by moisture intrusion. To determine the distribution of moisture in the structure, they are often examined pointwise by core sampling. In addition to invasive methods, non-destructive methods may be applied to obtain three-dimensional hints on the moisture distribution with structures of interest.            
The purpose of this paper is to show that non-destructive determination of moisture distribution is possible by using and combining geophysical measurement methods such as infrared thermography (IR), ultrasound (US) and ground penetrating radar (GPR). There are examples for the combination of these methods for non-destructive examination, but it is not yet commonly applied in the field of restoration and conservation of historic buildings.            
We present results of geophysical investigations of medieval wall paintings in the cloister of the cathedral in Schleswig (Federal State Schleswig-Holstein, Northern Germany) in the framework of a project funded by the German Federal Foundation for the Environment (Deutsche Bundesstiftung Umwelt - DBU). In the cloister, large-scale alterations of the medieval red-line paintings occurred due to gypsum deposits and a shellac coating. In order to quantify the material properties of a vault section (yoke) in the cloister during the restoration ultrasound surface wave measurements, passive and active thermography and ground penetrating radar measurements were carried out.
Repeating measurements at intervals of several months made it possible to evaluate the effectiveness of the test treatments by different solvents to remove the shellac as well as the gypsum deposits. In addition, our results from the passive thermography measurements show that in one section a defect in the horizontal barrier could be responsible for moisture ingress and associated damage. The radargrams recorded in this area confirm that a significant change in reflection amplitudes is present in the areas of increased moisture.

How to cite: Esel, Y., Erkul, E., Schulte-Kortnack, D., Leonhardt, C., Heller, J., and Meier, T.: Non-destructive geophysical damage analysis of medieval plaster in the cloister of the St. Petri Cathedral Schleswig (Germany), EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2228, https://doi.org/10.5194/egusphere-egu23-2228, 2023.

EGU23-2347 | ECS | Posters on site | GI2.1

Non-destructive testing methods and numerical development for enhancing airfield pavement management 

Konstantinos Gkyrtis, Christina Plati, and Andreas Loizos

Pavements are an essential component of airport facilities. Airport infrastructures serve to safely transport people and goods on a day-to-day basis. They promote economic development, both regionally and internationally, by also boosting tourist flows. In times of crisis, they can be used for societal emergencies, such as managing migration flows. Therefore, airports need pavements in good physical condition to ensure uninterrupted operations. However, interventions on airfield pavements are costly and labor intensive. Aspects of pavement structural performance related to bearing capacity and damage potential remain of paramount importance as the service life of a pavement extends beyond its design life. Therefore, structural condition evaluation is required to ensure the long-term bearing capacity of the pavement. 

The design and evaluation of flexible airfield pavements are generally based on the Multi-Layered Elastic Theory (MLET) in accordance with Federal Aviation Administration (FAA) principles. The most informative tool for structural evaluation is the Falling Weight Deflectometer (FWD), which senses pavement surfaces using geophones that record load-induced deflections at various locations. Additional geophysical inspection data using Ground Penetrating Radar (GRP) is processed to estimate the stratigraphy of the pavement. The integration of the above data provides an estimate of the pavement's performance and potential for damage. However, GRP is not always readily applicable.

In addition, the most important concern in pavement evaluation is the mechanical characterization of pavement materials. At the top of pavement structures, asphalt mixtures behave as a function of temperature and loading frequency. This viscoelastic behavior deviates from MLET and this issue needs further investigation. Therefore, this study integrates measured NDT data and sample data from cores taken in-situ. The pavement under study is an existing asphalt pavement of a runway at a regional airport in Southern Europe. A comparative evaluation of the strain state within the pavement body is performed both at critical locations and at the pavement surface, taking into account elastic and viscoelastic behaviors. Strains are an important input to models of long-term pavement performance, which has a critical influence on aircraft maneuverability. In turn, the significant discrepancies found highlight the need for more mechanistic considerations in predicting the damage and stress potential of airfield pavements so that maintenance and/or rehabilitation needs can be better managed and planned.

Overall, this study highlights the sensing capabilities of NDT data towards a structural health monitoring of airfield pavements. Ground-truth data from limited destructive testing enrich pavement evaluation processes and enhance conventional FAA evaluation procedures. The study proposes a numerical development for accurate field inspections and improved monitoring protocols for the benefit of airfield pavement management and rehabilitation planning. 

How to cite: Gkyrtis, K., Plati, C., and Loizos, A.: Non-destructive testing methods and numerical development for enhancing airfield pavement management, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2347, https://doi.org/10.5194/egusphere-egu23-2347, 2023.

The Laacher See Event- (LSE-) volcanism isochrone of 12.850 yrs BP (Bujatti-Narbeshuber, 1997), proxy for P/H boundary KISS (Bujatti-Narbeshuber, 1996), was improved from Gerzensee varves to 13.034 cal yrs BP (Van Raden, 2019).

    This LSE date now separates end Pleistocene, first, mainly oceanic-water KISS, from the second, Holocene-Younger Dryas Onset (YDO), continental-ice impact, as predicted by KISS-hypothesis, separating:„ a continental Koefels-comet ice-impact, from the mainly oceanic KISS, at the Pleistocene/Holocene boundary, associated with global warming, dendro C14 spikes, faunal mass extinction...“ (Bujatti-Narbeshuber, 1996; Max, 2022).

    Oceanic-water LSE-KISS (13.034 cal yrs BP, varves) of end Alleroed temperature maximum, separates by 157 yrs from continental-ice YDO-KISS (12.877 cal yrs BP, varve-date). A larger gap of 184 yrs results, taking C 14 dated YD-KISS (12.850 cal yrs BP), approaching 200 yrs of earlier varve-studies (Bujatti-Narbeshuber, 1997).

    LSE-KISS varve-date differs by 47 yrs from geo-magnetic Gothenberg Excursion Onset- (GEO-) isochrone of 13.081 cal yrs BP (Chen, 2020), suggesting geo-magnetic reversal, True Polar Wander (TPW) GEO-TPW-KISS from 2 Koefels-comet (Taurid-) fragments. This considers end-paleolithic Magdalenian Impact Sequelae Symbolisations (MISS).

    Questioning P/H isostatic-unloading volcanism (Zielinsky, 1996), LSE-KISS volcanism is from Mid Atlantic Ridge & Mid Atlantic Plateau (MAR&MAP) impact (Bujatti-Narbeshuber, 1997, 2022), as further corroborated by Greenland (NGRIP) ice-core sulfate monitoring: from LSE-KISS-volcanism (12.978 cal yrs) to YDO (12.867 cal yr BP), within 110 yrs, an unprecedented, bipolar-volcanic-eruption-quadruplet resulted (Lin, 2022).

    The first Taurid LSE-KISS (Varves-date: 13.034 cal yrs BP, GEO-date: 13.084 cal yrs BP.) into oceanic-water is evident from two 700 km Mid Atlantic Ridge & Plateau Lowering Events (MARPLES) releasing two separate Tsunamis (Bujatti-Narbeshuber, 2022): Resulting in submarine explosive-magmatism-silicates, seafloor-carbonates, volcanic-ash and sea-water in huge strato-meso-spheric overheated steam-plume moving eastward by eolian transport, descending in drowning rain-flood, largely contributing to Eurasian loess sediment layer (Muck, 1976).

    This is stratigraphically verified in e.g. relative stratigraphic positions in Netherland, Geldrop-Aalsterhut, with Younger Coversand I, bleached (!) (AMS 13.080- 12.915 cal yrs BP) underlying intercalated (!), charcoal rich (AMS 12.785-12.650 cal yrs BP) Usselo Horizon (Andronikov, 2016). It corresponds to US, Black Mats stratigraphy from second Taurid, continental-ice, YD-KISS (12.850 cal yrs BP, C14) plus Carolina Bays (CB) with: 1. Soft, white, loess sediment from first oceanic LSE-KISS. 2. YD-KISS proxies-stratum. 3. e.g. Carolina-Florida-coast-sand-disturbances, within 1.500 km radius of continental-ice YD-KISS ice-ejecta impact-curtain of 500.000 CB (LIDAR) 4. Black Mats after YD-KISS.

    After visiting Koefels-crater an “below continental-glacier-ice, circular geomagnetic-anomaly with paleoseismic Koefels-corridor of twelfe Holocene rockfalls”, Eugene Shoemaker (Vienna, May 5th 1997), when asked about Carolina Bays causation, is quoted: “Eugene spoke of a late Pleistocene origin of the Bays and as glaciological features while I preferred the paleoseismic interpretation. I interprete them as paleoseismic impact-seismic liquefaction features. They … are the first evidence for a late Pleistocene impact event. Dated by me …12.850 BP (1950) in calendar years”. (Bujatti-Narbeshuber, NHM letter to John Grant III, Sept. 22nd 1997).

    Both P/H-impacts break&make, Pleistocene criticality&Holocene damped flow, through 700 km geomorphological threshold (GLOVES) submersion & through (GTT) water, CO2 Greenhouse-gas-production, beyond glaciation threshold for hot climate prediction.

How to cite: Bujatti-Narbeshuber, M.: Pleistocene/Holocene (P/H) boundary oceanic Koefels-comet Impact Series Scenario (KISS) of 12.850 yr BP Global-warming Threshold Triad (GTT)-Part III, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2869, https://doi.org/10.5194/egusphere-egu23-2869, 2023.

To evaluate the feasibility of CO2 sequestration in offshore, South Korea, we studied numerical modelling with elastic velocity model. The CO2 storage candidate is a brine saturated aquifer formation overlain by basalt caprock in the Southern Continental Shelf of Korea. Basalt formation without joint and fracture can seal a storage volume preventing leakage of injected CO2. Result of preliminary two-dimensional seismic exploration estimated that storage potential would be from 42.07 to 143.79 Mt of CO2. The input model include P- and S-wave velocity and density of shallow sediment and vasalt layer. To simulate CO2 injection, we assumed an area of CO2 plume at the interval beneath the depth of basalt formation and artificially decreased P-, S-wave velocities, and density values. Synthesized seismic records are comparable with survey's gather as direct arrival and primary reflections. The ongoing work can be extended on a quantitative verification concerning serveral cased of varying velcoties and densities.

How to cite: Cheong, S., Kang, M., and Kim, K. J.: Numerical modelling of seismic field record with elastic velocity construction for CO2 sequestration in offshore, South Korea, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2980, https://doi.org/10.5194/egusphere-egu23-2980, 2023.

EGU23-4861 | Orals | GI2.1

Decay diagnosis of tree trunks using 3D point cloud and reverse time migration of GPR data 

Zhijie Chen, Hai Liu, Meng Xu, Yunpeng Yue, and Bin Zhang

Health monitoring and disease mitigation of trees are essential to ensure the sustainability of wood industry, safety of ecosystems, and maintenance of climatic conditions. Several non-destructive testing methods have been applied to monitor and detect the decays inside the trunks. Among them, ground penetrating radar (GPR) has gained recognition due to its high efficiency and good resolution. However, due to the wide beam width of the antenna pattern and the complicated scattering caused by the trunk structure, the recorded GPR profile is far from the actual geometry of the tree trunk. Moreover, the irregular contour of the tree trunk makes traditional data processing algorithms difficult to be performed. Therefore, an efficient migration algorithm with high resolution, as well as a high accuracy survey-line positioning method for curved contour of the trunk should be developed.

In this paper, a combined approach is proposed to image the inner structures inside the irregular-shaped trunks. In the first step, the 3D contour of the targeted tree trunk is built up by a 3D point cloud technique via photographing around the trunk at various angles. Subsequently, the 2D irregular contour of the cross-section of trunk at the position of the GPR survey line is extracted by the Canny edge detection method to locate the accurate position of each GPR A-scans [1]. Thirdly, the raw GPR profile is pre-processed to suppress undesired noise and clutters. Then, an RTM algorithm based on the zero-time imaging condition is applied for image reconstruction using the extracted 2D contour [2]. Lastly, a denoising method based on the total variation (TV) regularization is applied for artifact suppression in the reconstructed images [3].

Numerical, laboratory and field experiments are carried out to validate the applicability of the proposed approach. Both numerical and laboratory experimental results show that the RTM can yield more accurate and higher resolution images of the inner structures of the tree cross section than the BP algorithm. The proposed approach is further applied to a diseased camphor tree, and an elliptical decay defect is found the in the migrated GPR image. The results are validated by a visual inspection after the tree trunk was sawed down.

Fig. 1 Field experiment. (a) Geometric reconstruction result using point cloud data, (b) migrated result by the RTM algorithm and (c) bottom view of the tree trunk after sawing down. The red and yellow ellipses indicate the cavity and the decay region in the trunk, respectively.

References:

[1] Canny, "A Computational Approach to edge detection," IEEE Transactions on Pattern Analysis and Machine Interllgent, vol. PAMI-8, no. 6, pp. 679-698, 1986, doi: 10.1109/TPAMI.1986.4767851.

[2] S. Chattopadhyay and G. A. McMechan, "Imaging conditions for prestack reverse-time migration," Geophysics, vol. 73, no. 3, pp. S81-S89, 2008, doi: 10.1190/1.2903822.

[3] L. I. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D, vol. 60, pp. 259-268, 1992, doi: 10.1016/0167-2789(92)90242-F.

How to cite: Chen, Z., Liu, H., Xu, M., Yue, Y., and Zhang, B.: Decay diagnosis of tree trunks using 3D point cloud and reverse time migration of GPR data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4861, https://doi.org/10.5194/egusphere-egu23-4861, 2023.

EGU23-6795 | ECS | Orals | GI2.1

Relaxing requirements for spatio-temporal data fusion 

Harkaitz Goyena, Unai Pérez-Goya, Manuel Montesino-San Martín, Ana F. Militino, Peter M. Atkinson, and M. Dolores Ugarte

Satellite sensors need to make a trade-off between revisit frequency and spatial resolution. This work presents a spatio-temporal image fusion method called Unpaired Spatio-Temporal Fusion of Image Patches (USTFIP). This method combines data from different multispectral sensors and creates images combining the best of each satellite in terms of frequency and resolution. It generates synthetic images and selects optimal information from cloud-contaminated images, to avoid the need of cloud-free matching pairs of satellite images. The removal of this restriction makes it easier to run our fusion algorithm even in the presence of clouds, which are frequent in time series of satellite images. The increasing demand of larger datasets makes necessary the use of computationally optimized methods. Therefore, this method is programmed to run in parallel reducing the run-time with regard to other methods. USTFIP is tested through an experimental scenario with similar procedures as Fit-FC, STARFM and FSDAF. Finally, USTFIP is the most robust, since its prediction accuracy deprecates at a much lower rate as classical requirements become progressively difficult to meet.

How to cite: Goyena, H., Pérez-Goya, U., Montesino-San Martín, M., F. Militino, A., Atkinson, P. M., and Ugarte, M. D.: Relaxing requirements for spatio-temporal data fusion, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6795, https://doi.org/10.5194/egusphere-egu23-6795, 2023.

Continual monitoring of tree roots, which is essential when considering tree health and safety, is possible using a digital model. Non-destructive techniques, for instance, laser scanning, acoustics, and Ground Penetrating Radar (GPR) have been used in the past to study both the external and internal physical dimensions of objects and structures [1], including trees [2,3]. Recent studies have shown that GPR is effective in mapping the root system's network in street trees [3]. Light Detection and Ranging (LiDAR) technology has also been employed in infrastructure management to generate 3D data and to detect surface displacements with millimeter accuracy [4]. However, scanning such structures using current state-of-the-art technologies can be expensive and time consuming. Further, continual monitoring of tree roots requires multiple visits to tree sites and, oftentimes, repeated excavations of soil.

This work proposes a Virtual Reality (VR) system using smartphone-based LiDAR and GPR data to capture ground surface and subsurface information to monitor the location of tree roots. Both datasets can be visualized in 3D in a VR environment for future assessment. LiDAR technology has recently become available in smartphones (for instance, the Apple iPhone 12+) and can scan a surface, e.g., the base of a tree, and export the data to a 3D modelling and visualization application. Using GPR data, we combined subsurface information on the location of tree roots with the LiDAR scan to provide a holistic digital model of the physical site. The system can provide a relatively low-cost environmental modelling and assessment solution, which will allow researchers and environmental professionals to a) create digital 3D snapshots of a physical site for later assessment, b) track positional data on existing tree roots, and c) inform the decision-making process regarding locations for potential future excavations.

Acknowledgments: Sincere thanks to the following for their support: Lord Faringdon Charitable Trust, The Schroder Foundation, Cazenove Charitable Trust, Ernest Cook Trust, Sir Henry Keswick, Ian Bond, P. F. Charitable Trust, Prospect Investment Management Limited, The Adrian Swire Charitable Trust, The John Swire 1989 Charitable Trust, The Sackler Trust, The Tanlaw Foundation, and The Wyfold Charitable Trust. The Authors would also like to thank Mr Dale Mortimer (representing the Ealing Council) and the Walpole Park for facilitating this research.

References

[1] Alani A. M. et al., Non-destructive assessment of a historic masonry arch bridge using ground penetrating radar and 3D laser scanner. IMEKO International Conference on Metrology for Archaeology and Cultural Heritage Lecce, Italy, October 23-25, 2017.

[2] Ježová, J., Mertens, L., Lambot, S., 2016. “Ground-penetrating radar for observing tree trunks and other cylindrical objects,” Construction and Building Materials (123), 214-225.

[3] Lantini, L., Alani, A. M., Giannakis, I., Benedetto, A. and Tosti, F., 2020. "Application of ground penetrating radar for mapping tree root system architecture and mass density of street trees," Advances in Transportation Studies (3), 51-62.

[4] Lee, J. et al., Long-term displacement measurement of bridges using a LiDAR system. Struct Control Health Monit. 2019; 26:e2428.

How to cite: Uzor, S., Lantini, L., and Tosti, F.: Low-cost assessment and visualization of tree roots using smartphone LiDAR, Ground-Penetrating Radar (GPR) data and virtual reality, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6908, https://doi.org/10.5194/egusphere-egu23-6908, 2023.

EGU23-8384 | ECS | Orals | GI2.1

A Study on the Effect of Target Orientation on the GPR Detection of Tree Roots Using a Deep Learning Approach 

Livia Lantini, Federica Massimi, Saeed Sotoudeh, Dale Mortimer, Francesco Benedetto, and Fabio Tosti

Monitoring and protection of natural resources have grown increasingly important in recent years, since the effect of emerging illnesses has caused serious concerns among environmentalists and communities. In this regard, tree roots are one of the most crucial and fragile plant organs, as well as one of the most difficult to assess [1].

Within this context, ground penetrating radar (GPR) applications have shown to be precise and effective for investigating and mapping tree roots [2]. Furthermore, in order to overcome limitations arising from natural soil heterogeneity, a recent study has proven the feasibility of deep learning image-based detection and classification methods applied to the GPR investigation of tree roots [3].

The present research proposes an analysis of the effect of root orientation on the GPR detection of tree root systems. To this end, a dedicated survey methodology was developed for compilation of a database of isolated roots. A set of GPR data was collected with different incidence angles with respect to each investigated root. The GPR signal is then processed in both temporal and frequency domains to filter out existing noise-related information and obtain spectrograms (i.e. a visual representation of a signal's frequency spectrum relative to time). Subsequently, an image-based deep learning framework is implemented, and its performance in recognising outputs with different incidence angles is compared to traditional machine learning classifiers. The preliminary results of this research demonstrate the potential of the proposed approach and pave the way for the use of novel ways to enhance the interpretation of tree root systems.

 

Acknowledgements

The Authors would like to express their sincere thanks and gratitude to the following trusts, charities, organisations and individuals for their generosity in supporting this project: Lord Faringdon Charitable Trust, The Schroder Foundation, Cazenove Charitable Trust, Ernest Cook Trust, Sir Henry Keswick, Ian Bond, P. F. Charitable Trust, Prospect Investment Management Limited, The Adrian Swire Charitable Trust, The John Swire 1989 Charitable Trust, The Sackler Trust, The Tanlaw Foundation, and The Wyfold Charitable Trust. The Authors would also like to thank the Ealing Council and the Walpole Park for facilitating this research.

 

References

[1] Innes, J. L., 1993. Forest health: its assessment and status. CAB International.

[2] Lantini, L., Tosti, F., Giannakis, I., Zou, L., Benedetto, A. and Alani, A. M., 2020. "An Enhanced Data Processing Framework for Mapping Tree Root Systems Using Ground Penetrating Radar," Remote Sensing 12(20), 3417.

[3] Lantini, L., Massimi, F., Tosti, F., Alani, A. M. and Benedetto, F. "A Deep Learning Approach for Tree Root Detection using GPR Spectrogram Imagery," 2022 45th International Conference on Telecommunications and Signal Processing (TSP), 2022, pp. 391-394.

How to cite: Lantini, L., Massimi, F., Sotoudeh, S., Mortimer, D., Benedetto, F., and Tosti, F.: A Study on the Effect of Target Orientation on the GPR Detection of Tree Roots Using a Deep Learning Approach, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8384, https://doi.org/10.5194/egusphere-egu23-8384, 2023.

EGU23-8667 | ECS | Posters on site | GI2.1

An Investigation into the Acquisition Parameters for GB-SAR Assessment of Bridge Structural Components 

Saeed Sotoudeh, Livia Lantini, Kevin Jagadissen Munisami, Amir M. Alani, and Fabio Tosti

Structural health monitoring (SHM) is a necessary measure to maintain bridge infrastructure safe. To this purpose, remote sensing has proven effective in acquiring data with high accuracy in relatively short time. Amongst the available methods, the ground-based synthetic aperture radar (GB-SAR) can detect sub-zero deflections up to 0.01 mm generated by moving vehicles or the environmental excitation of the bridges [1]. Interferometric radars are also capable of data collection regardless of weather, day, and night conditions (Alba et al., 2008). However, from the available literature - there is lack of studies and methods focusing on the actual capabilities of the GB-SAR to target specific structural elements and components of the bridge - which makes it difficult to associate the measured deflection with the actual bridge section. According to the antenna type, the footprint of the radar signal gets wider in distance which encompasses more elements and the presence of multiple targets in the same resolution cell adds uncertainty to the acquired data (Michel & Keller, 2021). To this effect, the purpose of the present research is to introduce a methodology for pinpointing targets using GB-SAR and aid the data interpretation. An experimental procedure is devised to control acquisition parameters and targets, and being able to analyse the returned outputs in a more clinical condition. The outcome of this research will add to the existing literature in terms of collecting data with enhanced precision and certainty.

 

Keywords

Structural Health Monitoring (SHM), GB-SAR, Remote Sensing, Interferometric Radar

 

Acknowledgements

This research was funded by the Vice-Chancellor’s PhD Scholarship at the University of West London.

 

References

[1] Benedettini, F., & Gentile, C. (2011). Operational modal testing and FE model tuning of a cable-stayed bridge. Engineering Structures, 33(6), 2063-2073.

[2] Alba, M., Bernardini, G., Giussani, A., Ricci, P. P., Roncoroni, F., Scaioni, M., Valgoi, P., & Zhang, K. (2008). Measurement of dam deformations by terrestrial interferometric techniques. Int.Arch.Photogramm.Remote Sens.Spat.Inf.Sci, 37(B1), 133-139.

[3] Michel, C., & Keller, S. (2021). Advancing ground-based radar processing for bridge infrastructure monitoring. Sensors, 21(6), 2172.

How to cite: Sotoudeh, S., Lantini, L., Munisami, K. J., Alani, A. M., and Tosti, F.: An Investigation into the Acquisition Parameters for GB-SAR Assessment of Bridge Structural Components, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8667, https://doi.org/10.5194/egusphere-egu23-8667, 2023.

EGU23-8762 | ECS | Orals | GI2.1

Joint Interpretation of Multi-Frequency Ground Penetrating Radar and Ultrasound Data for Mapping Cracks and Cavities in Tree Trunks 

Saeed Parnow, Livia Lantini, Stephen Uzor, Amir M. Alani, and Fabio Tosti

As the Earth's lungs, trees are a natural resource that provide, amongst others, food, lumber, and oxygen. Therefore, monitoring these wooden structures with non-destructive testing (NDT) techniques such as ground penetrating radar (GPR) and ultrasound can provide valuable information about inner flaws and decays, which is an essential step for tree conservation.  

In recent years, GPR and ultrasound have been used to delineate the interior architecture of tree trunks [1-3]. However, more research is required to improve results and consequently have a more reliable interpretation. Due to limitations in depth penetration and signal-to-noise ratio [4], these approaches have a limited capacity for resolving features. The use of gain functions and higher frequencies to compensate for wave attenuation may exaggerate events and reduce resolution, respectively.

In this context, an integration between GPR multi-frequency and ultrasound data can be used to address this issue. Data were collected on a tree trunk log at the Faringdon Centre for Non-Destructive Testing and Remote Sensing using two high-frequency GPR systems (2GHz and 4GHz central frequencies) and an ultrasound (supporting a wide range of transducers from 24 kHz up to 500 kHz) testing equipment. Internal features of interest in terms of extended perimetric air gaps at the bark-wood interface, natural cracks and small artificial cavities were investigated through electromagnetic and mechanical waves. After compilation of data, a joint interpretation strategy for data analysis is developed. The processed data were mapped against the cut sections of the tree for validity purposes.

Although study of stand tree trunks would be more challenging, the findings of this research may be applied for wood timbers and pave the way to future research for living tree trunks.

 

Acknowledgements

This research was funded by the Vice-Chancellor’s PhD Scholarship at the University of West London.

 

References

[1] Arciniegas, A., et al., Literature review of acoustic and ultrasonic tomography in standing trees. Trees, 2014. 28(6): p. 1559-1567. 

[2] Giannakis, I., et al., Health monitoring of tree trunks using ground penetrating radar. IEEE Transactions on Geoscience and Remote Sensing, 2019. 57(10): p. 8317-8326.

[3] Espinosa, L., et al., Ultrasound computed tomography on standing trees: accounting for wood anisotropy permits a more accurate detection of defects. Annals of Forest Science, 2020. 77(3): p. 1-13.

[4] Tosti, F., et al., The use of GPR and microwave tomography for the assessment of the internal structure of hollow trees. IEEE Transactions on Geoscience and Remote Sensing, 2021. 60: p. 1-14.

 

How to cite: Parnow, S., Lantini, L., Uzor, S., Alani, A. M., and Tosti, F.: Joint Interpretation of Multi-Frequency Ground Penetrating Radar and Ultrasound Data for Mapping Cracks and Cavities in Tree Trunks, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8762, https://doi.org/10.5194/egusphere-egu23-8762, 2023.

EGU23-10874 | ECS | Orals | GI2.1

Ground subsidence risk mapping and assessment along Shanghai metro lines by PS-InSAR and LightGBM 

Long Chai, Xiongyao Xie, Biao Zhou, and Li Zeng

Ground subsidence is a typical geological hazard in urban areas. It endangers the safety of infrastructures, such as subways. In this study, the ground subsidence risk of Shanghai metro lines was mapped and assessed. Firstly, PS-InSAR was used for the ground subsidence survey, and subsidence intensity was divided into five classes according to subsidence velocity. 10 subsidence causal factors were collected and the frequency ratio method was applied to analyze the correlation between subsidence and its causal factors. Then LightGBM model was used to generate a ground subsidence susceptibility map. And receiver operating characteristic curve and area under the curve (AUC) were adopted to assess the model. And AUC is 0.904, which suggests the model's performance is excellent. Finally, a risk matrix was introduced to consider the intensity and susceptibility of ground subsidence. The risk of ground subsidence was mapped and classified into five levels: R1 (very low), R2 (low), R3 (medium), R4 (high), and R5 (very high). The results showed that the risk of subway ground subsidence exhibited a regional-related characteristic. Metro lines located in areas with higher ground subsidence risk levels also had higher ground subsidence risk levels. Meanwhile, the statistical results of subway ground subsidence risk levels showed that subway stations were safer than sections.

How to cite: Chai, L., Xie, X., Zhou, B., and Zeng, L.: Ground subsidence risk mapping and assessment along Shanghai metro lines by PS-InSAR and LightGBM, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10874, https://doi.org/10.5194/egusphere-egu23-10874, 2023.

EGU23-12226 | ECS | Orals | GI2.1

Evaluation of Spectral Mixing Techniques for Geological Mixture in a Laboratory Setup: Insights on the nature of mixing 

Maitreya Mohan Sahoo, Kalimuthu Rajendran, Arun Pattathal Vijayakumar, Shibu K. Mathew, and Alok Porwal

Geological mixtures having endmembers mixed at a fine scale pose a challenge to estimating their fractional abundances. Light incident on these mixtures interacts both at multilayered and surface levels, resulting in volumetric and albedo scattering, respectively. Accounting for these effects necessitates a nonlinear spectral mixing model approach rather than conventional linear mixing. In this study, we evaluate the performances of linear and various nonlinear spectral mixing models for an intimately mixed geological mixture, i.e., a banded hematite quartzite (BHQ) sample. The BHQ sample with distinct endmembers of hematite and quartzite facilitated our study of the behavior of light on two-component nonlinear mixtures. In a laboratory-based experimental setup, we used a spectroradiometer of full spectral range in the visible and near-infrared regions (350 to 2500nm) to acquire a hyperspectral image of the BHQ sample. It was followed by the identification of nonlinearly mixed regions and inferring changes in their spectral features. The nonlinearity induced in these regions was attributed to two significant causes- (1) the fine scale of spectral mixing and (2) the spectroradiometer sensor’s limited ability to spatially distinguish between focused and neighboring points, thereby producing a point spread effect. We observed the effects of nonlinear spectral mixing for our sample by changing the sensor’s height from 1mm to 5mm, to simulate fine and coarse-resolution images, respectively. The spectral mixing was modeled using the existing mapped ground truth fractional abundances and library endmembers’ spectra by linear mixing and established nonlinear techniques of the generalized bilinear model (GBM), polynomial post-nonlinear model (PPNM), kernel-based support vector machines (k-SVMs). The evaluated performance metric of reconstruction error revealed the nonlinearity effect in image pixels through statistical tests and nonlinearity parameters used in these models. It was further observed that the associated nonlinearity increases from fine to coarse-resolution images. The minimum error of image reconstruction was observed for the polynomial post-nonlinear model, with a single nonlinearity parameter and an average reconstruction error (ARE) of 0.05. Our study provided insights into the nature of nonlinear mixing with endmember composition and particle sizes.

How to cite: Sahoo, M. M., Rajendran, K., Pattathal Vijayakumar, A., Mathew, S. K., and Porwal, A.: Evaluation of Spectral Mixing Techniques for Geological Mixture in a Laboratory Setup: Insights on the nature of mixing, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12226, https://doi.org/10.5194/egusphere-egu23-12226, 2023.

EGU23-13163 | ECS | Orals | GI2.1

High-resolution grain-size analysis and non-destructive hyperspectral imaging of sediments from the Gaoping canyon levee to establish past typhoon and monsoon activities affecting Taiwan during the late Holocene 

Joffrey Bertaz, Kévin Jacq, Christophe Colin, Zhifei Liu, Maxime debret, Hongchao Zhao, and Andrew Tien-Shun Lin

Non-destructive and high-resolution hyperspectral analyses are widely used in planetary and environmental sciences and in mining exploration. In recent years, the scanning method was applied to lacustrine sediment cores in complement to XRF core scanning. However, this approach was rarely applied to marine sediments. The Gaoping canyon, located south of Taiwan island, is connected to the Gaoping River and is a very active canyon with large sediment transfer capacity. In particular, about 4 typhoon-driven hyperpycnal flows have been recorded by mooring systems in every recent year. Studying their frequency and intensity responding to past climate and environmental changes is a key to understand future tropical storm frequency and related climate variability. Core MD18-3574 was collected on the western levee of the Gaoping canyon and displays numerous fine laminations (millimetric to centimetric) recording the deposition of the gravity flows occurring in the canyon and on the slope. In this study, we combined non-destructive analyses such as XRF core scanning and hyperspectral imaging with high-resolution grain size and XRD bulk mineralogy analyses to understand the sedimentological and geochemical variations at the scale of the laminae. Core MD18-3574 sediments consist mainly of fine silt, presenting an alternance of fine-grained and coarse-grained laminations. The average mean grain size is 13.4 µm ranging from 9 to 20.5 µm. Thick coarser grained laminations are showing grain size distributions and asymmetric sorting of typical turbidite sequence. Grain size and bulk mineralogy display great visual and statistical correlation with XRF (Fe/Ca, Si/Al) and hyperspectral proxies (sediment darkness (Rmean), Clay_R2200). Principal component analyses (PCA) demonstrates that darker laminae are composed of coarser sediments with high Si/Al (quartz and feldspar-rich) and Clay_R2200 values and low Fe/Ca (calcite-rich) resulting from gravity flows.  Inversely, lighter laminae consist of finer sediments with low Si/Al (muscovite and illite-rich), Clay_R2200 and high Fe/Ca resulting from hemipelagic deposition. Thus, such interpretation was extended to the core scale to identify gravity flows deposits layers. Moderate intensity tropical storm frequency is decreasing since the last 4 ka in response to the sea surface temperature (SST) decrease and enhanced East Asian winter monsoon since the middle Holocene. Tropical storm intensity increased after 2 ka in La Niña like periods indicating that the surge of super-typhoons hitting Taiwan could be triggered by El Niño Southern Oscillation (ENSO) state and variability. We can then assess that tropical storm activity is controlled by SST, monsoon system and ENSO conditions. This study brings new insights in the prediction of the ongoing climate change impacts on storms activity in the western Pacific Ocean.

How to cite: Bertaz, J., Jacq, K., Colin, C., Liu, Z., debret, M., Zhao, H., and Lin, A. T.-S.: High-resolution grain-size analysis and non-destructive hyperspectral imaging of sediments from the Gaoping canyon levee to establish past typhoon and monsoon activities affecting Taiwan during the late Holocene, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13163, https://doi.org/10.5194/egusphere-egu23-13163, 2023.

EGU23-13329 | ECS | Orals | GI2.1

Combined use of NDT methods for steel rebar corrosion monitoring 

Giacomo Fornasari, Federica Zanotto, Andrea Balbo, Vincenzo Grassi, and Enzo Rizzo

This paper describes laboratory tests performed with an NDT geophysical methods: Ground Penetrating Radar (GPR), Self Potential (SP) and Direct Current (DC) methods in order to monitor the corrosion of a rebar embedded in concrete. Even if the GPR is a common geophysical method for reinforced concrete structures, the SP and DC techniques are not widely used. Rebar corrosion is one of the main causes of deterioration of engineering reinforced structures and this degradation phenomena reduces their service life and durability. Non-destructive testing and evaluation of the rebar corrosion is a major issue for predicting the service life of reinforced concrete structures.

Several new experiments were performed at Applied Geophysical laboratory of University of Ferrara, following the experiences coming from previous tests (Fornasari et al., 2022), where two reinforced concrete samples of about 50 cm x 30 cm were cast, with a central ribbed steel rebar of 10 mm diameter and 35 cm long, were partially immersed in a plastic box with salty and distilled water. In this experiment, we applied a new protocol, where an epoxy resin was used in order to focalize the corrosion only along the exposed part of the rebar. The steel rebar was partially painted with a waterproof resin in order to leave only the central part uncovered for a length of 8 cm. The same waterproof epoxy resin was applied on part of the concrete sample, in order to have a specific chlorides diffusion across a freeway zone of about 10cm x 8cm defined below the exposed rebar.

The experiments were carried out on two identically constructed reinforced concrete samples, exposed to distilled water (sample “A”) and the second, exposed to a salty water with chlorides (sample “B”). Both samples were partially immersed for only 1 cm form the lower surface. The sample B was immersed in a salty water plastic box with different NaCl concentrations. An initial NaCl concentration of 0.1 % was adopted for 7 days, then the concentration was increased to 1% and finally to 3.5% for further 7 days. The experiment was set up in two phases. In the first phase of this study, we monitored the "natural" corrosion occurred on sample "B" due to the diffusion of chlorides towards the steel rebar comparing the obtained data with those of sample "A" exposed to distilled water. In the second phase of the study, accelerated corrosion was applied to sample "B" in order to induce an increment of the corrosion phenomena. The accelerated corrosion was designed in order to reach different theoretical levels of mass weight loss in the steel rebar, which were of 2%, 5%, 10% and 20%. During the experiments, 2GHz C-Thrue GPR antenna, Multivoltmeter with non-polarized calomel referenced electrode for SP and ABEM Terrameter LS for resistivity data, were used to monitor the rebar corrosion monitoring. The collected data were used for an integration observation to detect the evolution of the corrosion phenomenon on the reinforcement steel rebar and to define a quantitative analysis of the phenomena.

 

How to cite: Fornasari, G., Zanotto, F., Balbo, A., Grassi, V., and Rizzo, E.: Combined use of NDT methods for steel rebar corrosion monitoring, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13329, https://doi.org/10.5194/egusphere-egu23-13329, 2023.

EGU23-13720 | ECS | Posters on site | GI2.1

A fully customizable data management system for Built Cultural Heritage surveys through NDT 

Irene Centauro, Teresa Salvatici, Sara Calandra, and Carlo Alberto Garzonio

A fully customizable data management system for Built Cultural Heritage surveys through NDT

The diagnosis of Built Cultural Heritage using non-invasive methods is useful to deepen the understanding of building characteristics, assessing the state of conservation of materials, and monitoring over time the effectiveness of restoration interventions.

Ultrasonic and sonic tests are Non-Destructive Techniques widely used to evaluate the consistency of historic masonry and stone elements and to identify on-site internal defects such as voids, detachments, fractures. These tests, in addition to being suitable for Cultural Heritage because they are non-invasive, provide a fundamental preliminary screening useful to better address further analysis.

Ultrasonic and Sonic velocity tests performed on monuments involve a lot of different information obtained from many surveys.  It is therefore important to optimize the amount of data collected both during documentation and diagnostic phase, making them easily accessible and meaningful for analysis and monitoring. In addition, investigations set-up should be following a standard methodology, repeatable over time, suitable for different types of artifacts, and prepared for comparison with other techniques.

An integrated data management system is then also useful to support the decision-making processes behind maintenance actions.

This work proposes the development of a complete management IT solution for the Ultrasonic and Sonic measurements of different types of masonry, and stone artifacts. The system consists of a browser-based collaboration and document management platform, a mobile/desktop application for data entry, and a data visualization and reporting tool. This set of tools enable the complete processing of data, from the on-site survey to their analysis and visualization.

The proposed methodology allows the standardization of the data entry workflow, and it is scalable, so it can be adapted to different types of masonry and artifacts. Moreover, this system provides real-time verification of data, optimizes survey and analysis times, and reduces errors. The platform can be integrated with machine learning models, useful to gain insight from data.

This solution, aimed to improve the approach to diagnostics of Cultural Heritage, has been successfully applied by the LAM Laboratory of the Department of Earth Sciences (University of Florence) on different case studies (e.g., ashlar, frescoed walls, plastered masonries, stone columns, coat-of-arms, etc.) belonging to many important monuments.

How to cite: Centauro, I., Salvatici, T., Calandra, S., and Garzonio, C. A.: A fully customizable data management system for Built Cultural Heritage surveys through NDT, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13720, https://doi.org/10.5194/egusphere-egu23-13720, 2023.

EGU23-13934 | Orals | GI2.1

Pavements Layered Media Characterizations using deep learning-based GPR full-wave inversion 

Li Zeng, Biao Zhou, Xiongyao Xie, and Sébastien Lambot

The possibility to estimate accurately the subsurface electric properties of the pavements from ground-penetrating radar (GPR) signals using inverse modeling is obstructed by the appropriateness of the forward model describing the GPR subsurface system. In this presentation, we improved the recently developed approach of Lambot et al. whose success relies on a stepped-frequency continuous-wave (SFCW) radar combined with an off-ground monostatic transverse electromagnetic horn antenna. The deep-learning based method were adopted to train an intelligent model including the waveform of the Green’s functions. The method was applied and validated in laboratory conditions on a tank filled with a two-layered sand subject to different water contents. Results showed agreement between the predictions of measured Green’s functions deep-learning model and the measured ones. Model inversions for the dielectric permittivity and heights of antenna further demonstrated for a comparison of presented method.

How to cite: Zeng, L., Zhou, B., Xie, X., and Lambot, S.: Pavements Layered Media Characterizations using deep learning-based GPR full-wave inversion, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13934, https://doi.org/10.5194/egusphere-egu23-13934, 2023.

EGU23-14658 | Orals | GI2.1

Influence of tectonic deformation on the mechanical properties of calcareous rocks: drawbacks of the non-destructive techniques  

Elisa Mammoliti, Veronica Gironelli, Danica Jablonska, Stefano Mazzoli, Antonio Ferretti, Michele Morici, and Mirko Francioni

Discontinuity surfaces are well known to influence the mechanical behaviour of rocks under compression. Non-destructive techniques, such as ultrasonic pulse velocity and sclerometers, are increasingly used to estimate uniaxial compressive strength of rocks. In this study, several core samples derived from the doubling works of the railway network near Genga (Marche Region, Central Italy) were analysed in order to assess the influence of the structural geological context (proximity to folds, faults etc.) and tectonic deformation on rock strength. Tests were conducted in rock specimens through: i) conventional uniaxial compressive experiment, ii) non-destructive rebound-based methods such as Schmidt Hammer and Equotip  and iii) ultrasound. In this way, it was possible to make a critical analysis of the use of these techniques in the estimation of the uniaxial compressive strength (considering also information about discontinuity type, orientation and nature of the filling). Finally, a petrographic analysis using optical microscope has been undertaken as a support to the observations derived from the analysis at the sample scale. The results indicate that there are two main factors influencing the strength at the scale of the specimen. The first and most decisive factor is the presence of natural pre-existing fractures. The second is the tectonic deformation ratio: the greater the deformation is, the little the strength. Furthermore, through the combined use of uniaxial compressive experiment, non-destructive rebound-based methods and ultrasounds it was possible to highlights the advantages and limitations of each technique and define/propose new guidelines for their use. 

How to cite: Mammoliti, E., Gironelli, V., Jablonska, D., Mazzoli, S., Ferretti, A., Morici, M., and Francioni, M.: Influence of tectonic deformation on the mechanical properties of calcareous rocks: drawbacks of the non-destructive techniques , EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14658, https://doi.org/10.5194/egusphere-egu23-14658, 2023.

EGU23-14846 | ECS | Orals | GI2.1

Combined NDT data for road management through BIM models 

Luca Bertolini, Fabrizio D'Amico, Antonio Napolitano, Jhon Romer Diezmos Manalo, and Luca Bianchini Ciampoli

One of the main priorities for road administrations and stakeholders is the management and monitoring of critical infrastructures, especially transportation infrastructures. In this context, Building Information Modeling (BIM) can be one of the more effective methodologies to be used to optimize the management process. In Italy, several laws and regulations have been issued, making the use of BIM procedures mandatory for the design of new infrastructures and emphasizing its role in the management of existing civil works [1, 2].

Monitoring operations of transportation infrastructures are generally conducted by on-site surveys. Non-Destructive Testing methods (i.e., GPR, LiDAR, Laser Profilometer, InSAR, etc.) have been used to perform these inspections as their outputs have been proven to be effective in determining the conditions of the infrastructure and its assets [3]. Moreover, BIM methodology could prove a valuable tool to manage the data provided by these surveys, as it consists in the creation of digital models capable of containing information related to the object that they are representing. These models can be used to store over time the different information obtained by the NDT surveys to carry out integrated analysis on the conditions of the infrastructure [4].

This study aims to analyze a potential BIM process capable of integrating different NDT surveys’ outputs, to generate an informative digital model of an infrastructure and its assets. The proposed methodology is then able to merge the data provided by the inspections, which is typically obtained by different operators and comes in different file formats, in a single BIM model. The main goal of the research is to provide a process to optimize the management procedures of transportation infrastructures, by creating digital models capable of reducing the problems typically associated with the monitoring and maintenance of these critical civil works. By merging different information in a single environment and relying on survey data that are commonly analyzed separately, an integrated analysis of the infrastructure can be carried out and data loss can be reduced.

The study was developed by relying on real data, obtained from on-site surveys carried out over Italian infrastructures. As different outputs have been collected, BIM models of different assets of the analyzed infrastructures were defined. Preliminary results have shown that the proposed methodology can be a viable tool for optimizing the management process of these critical civil works.

Acknowledgements

The research is supported by the Italian Ministry of Education, University and Research under the National Project “Extended resilience analysis of transport networks (EXTRA TN): Towards a simultaneously space, aerial and ground sensed infrastructure for risks prevention”, PRIN 2017. Prot. 20179BP4SM.

References

[1] MIT, 2018. Ministero delle Infrastrutture e dei Trasporti, D. Lgs 109/2018

[2] MIT, 2021. Ministero delle Infrastrutture e dei Trasporti, D.M. 312/2021

[3] D’Amico F. et al., 2020. Integration of InSAR and GPR Techniques for Monitoring Transition Areas in Railway Bridges. NDT&E Int

[4] D’Amico, F. et al., 2022. Integrating Non-Destructive Surveys into a Preliminary BIM-Oriented Digital Model for Possible Future Application in Road Pavements Management. Infrastructures 7, no. 1: 10

How to cite: Bertolini, L., D'Amico, F., Napolitano, A., Manalo, J. R. D., and Bianchini Ciampoli, L.: Combined NDT data for road management through BIM models, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14846, https://doi.org/10.5194/egusphere-egu23-14846, 2023.

EGU23-14899 | ECS | Orals | GI2.1

Fusion of in-situ and spaceborne sensing for environmental monitoring 

Konstantinos Karyotis, Nikolaos Tsakiridis, and George Zalidis

Measuring soil reflectance in the field, rather than in a laboratory setting, can be very useful when it comes to numerous applications such as mapping the distribution of various soil properties, especially when prompt estimations are needed.  Recent advances in spectroscopy, and specifically in the development of low-cost Micro-Electro-Mechanical-Systems (MEMS) based spectrometers, pave the way for developing real-time applications in agriculture and environmental monitoring. Compared to high-end spectrometers, whose spectral range extends from Visible (VIS) and Near-InfraRed (NIR) to Shortwave InfraRed (SWIR), MEMS cover limited parts of the electromagnetic spectrum resulting in missing important information. In parallel, new space missions such as Planet Fusion are operationally ready and provide optical imagery (RGB and NIR) with high spatial (3m) and temporal (daily) resolution. To this end, we assessed the potential of augmenting the bands captured from a commercial MEMS sensor (Spectral Engines Nirone S2.2 @ 1750 – 2150 nm) by adjoining the Planet Fusion bands at the exact sampling date and location that in-situ scans originate.

Employing the above, a set of portable MEMS was used at a pilot area in Cyprus (Agia Varvara, Nicosia district) to develop a regional in-situ Soil Spectral Library (SSL). A set of 60 distinct locations were selected for capturing in situ spectral reflectance after the stratification of Planet Fusion pixels of the pilot area, while a physical soil sample was analyzed at the laboratory for the determination of Soil Organic Carbon (SOC) content. During the visit, topsoil moisture was also measured.

The resulting SSL, containing the in-situ spectra, SOC, and moisture content was further augmented by the 4 bands of Planet Fusion imagery acquired on the exact date of the field visit. At this stage, three Random Forest models for SOC content estimation were fitted using as explanatory variables initially only the MEMS data with moisture content, then Planet Fusion bands, and finally all three available inputs.

The results presented an observable decrease in RMSE of SOC content estimations when fusing in-situ with spaceborne data, highlighting the importance of the information contained at VIS-NIR when modeling SOC. On the other hand, the synergy of the two sensors is mutually beneficial; SOC absorption bands can also be found in the SWIR region and are hard to detect with remote sensing means since they fall within the strong water absorption region (around 1950 nm). MEMS-based systems operating at the SWIR part can support this process, and if combined with ancillary environmental measurements such as soil moisture, can provide a cost-effective solution for measuring SOC and other soil-related parameters. To loosen the necessity of laboratory analysis, it is necessary to establish protocols and guidelines for spectral data collection and management to ensure that the data collected is consistent and of high quality and develop representative SSLs that can be used to serve different modeling scenarios. 

How to cite: Karyotis, K., Tsakiridis, N., and Zalidis, G.: Fusion of in-situ and spaceborne sensing for environmental monitoring, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14899, https://doi.org/10.5194/egusphere-egu23-14899, 2023.

EGU23-14981 | ECS | Orals | GI2.1

Implementation of a Digital Twin integrating remote sensing information for network-level infrastructure monitoring 

Antonio Napolitano, Valerio Gagliardi, Luca Bertolini, Jhon Romer Diezmos Manalo, Alessandro Calvi, and Andrea Benedetto

Nowadays, there is an emerging demand from public authorities and managing bodies, to evaluate the overall health of infrastructures and identify the most critical transport assets. Considering the national-scale level, thousands of transport infrastructure are in critical conditions and require urgent maintenance actions. Currently, most of available Digital Twins (DT) allow to explore and visualize data including limited kind of information. This issue still limits the operative and practical use by infrastructure owners, that require fast solutions for managing several amount of data. Moreover, this idea is perfectly in line with European and national actions related to the development of a DT of the earth’s systems, including the “DestinE” programme of the European Commission by EUSPA and the European Space Agency (ESA). For this purpose, a dynamic DT model of a critical infrastructure is developed, using the available data about design information, historical maintenance operations and monitoring surveys based on satellite imageries.

In this context, this study presents an innovative concept of Digital Twin, which integrates all the details coming from NDTs surveys, on-site inspections and satellite-based information, to store, manage and visualize valuable information. This is made possible by analysing the main several gaps and limitations of existing platforms, providing a viable integrated solution developing an upgradable strategic analysis tool. To this purpose, remote sensing methods are identified as viable technologies for continuous monitoring operations. More specifically, data coming from satellites and the processing techniques, such as the Multi-Temporal SAR Interferometry approach, are strategic for the continuous monitoring of the displacements associated to transport infrastructures. An advantage of these techniques is the lighter data-processing required for the assessment of displacements and the detection of critical areas [1, 2].

The study introduces two main levels of innovation. The first one is associated to the integrated approach for transportation planning, integrating quantitative data from multi-sources, into the more traditional territorial analysis models. The second one is related to the technological engineering discipline, and it consists of the fusion of observation data from multi-source, with the last-generation dynamic data connected to the environment.

Acknowledgements

This research is supported by the Project “M.LAZIO”, accepted and funded by the Lazio Region, Italy.

References

[1] D'Amico, F. et al., “Implementation of an interoperable BIM platform integrating ground based and remote sensing information for network-level infrastructures monitoring”, Spie Remote Sensing 2022.

[2] Gagliardi, V. et al., “Bridge monitoring and assessment by high-resolution satellite remote sensing technologies”, Spie Future Sensing Technologies 2020.

How to cite: Napolitano, A., Gagliardi, V., Bertolini, L., Manalo, J. R. D., Calvi, A., and Benedetto, A.: Implementation of a Digital Twin integrating remote sensing information for network-level infrastructure monitoring, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14981, https://doi.org/10.5194/egusphere-egu23-14981, 2023.

EGU23-15542 | ECS | Orals | GI2.1

Novel perspectives in transport infrastructure management: Data-Fusion, integrated monitoring and augmented reality 

Valerio Gagliardi, Luca Bianchini Ciampoli, Fabrizio D'Amico, Alessandro Calvi, and Andrea Benedetto

Infrastructure networks are crucial elements to ensure the sustainability of the current development model in which the movement of people and goods is essential. On the other hand, transport assets are increasingly exposed to several issues, including climatic conditions changing, vulnerability and exposure to natural hazards such as hydraulic, geomorphological, landslides and seismic phenomena, which can affect the structural integrity causing damages and deteriorations. The context is made even more serious by the degradation of materials and the progressive ageing of infrastructure, often accelerated by environmental conditions and inadequate, or not always effective, maintenance actions. This requires the investigation of novel methods for the large-scale detection of network-scale linear infrastructures, and simultaneously, of detail to diagnose causes and determine the priorities for the most effective countermeasures.

The proposed solution is based on a Data-Fusion approach, merging data coming from multi-source and multi-scale data, to enhance the interpretation process in a holistic sense. The information comes from spaceborne Multi-temporal SAR Interferometry, complemented by more detailed aerial data, detected by UAVs and Ground Based Non-Destructive Testing Methods, including laser scanner surveys for resolution and digital integrability, high-resolution camera measurements assisted by artificial intelligence for the surface degradation and from prospecting data collected by Ground Penetrating Radar technology. All these data can be simultaneously analyzed into a comprehensive digital platform, providing a useful tool to support operators and public bodies to prioritize maintenance actions.

The digital platform can be investigated also using augmented reality tools, capable of generating and reproducing the Digital Twin of the inspected infrastructure into a real environment. This allows any monitoring evaluation through a diagnostic technique that integrates spatial, aerial, ground-based and geophysical surveys, allowing navigation within the infrastructure. Potential applications are numerous, ranging from mapping of wide areas affected by potential criticality to the definition of the main vulnerabilities related to the seismic and hydraulic risks, the analysis of land changes surrounding the assets following extreme natural events, and the reconstruction of historical deformative trends of roads, railways and bridges through the interpretation of SAR data.

Acknowledgments

This research is supported by the Italian Ministry of Education, University, and Research under the National Project “EXTRA TN”, PRIN2017, Prot. 20179BP4SM. In addition, this research is supported by the Project “MLAZIO” funded by Lazio Region (Italy).

How to cite: Gagliardi, V., Bianchini Ciampoli, L., D'Amico, F., Calvi, A., and Benedetto, A.: Novel perspectives in transport infrastructure management: Data-Fusion, integrated monitoring and augmented reality, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15542, https://doi.org/10.5194/egusphere-egu23-15542, 2023.

EGU23-16471 | ECS | Orals | GI2.1

Hydrogen isotope fractionation between leaf wax compounds and source water in tropical angiosperms 

Amrita Saishree, Shreyas Managave, and Vijayananda Sarangi

The hydrogen isotope fractionation between leaf wax compounds and source water, the apparent fractionation (εapp), necessary for the reconstruction of hydrogen isotopic composition (δD) of precipitation, is mainly assessed through field and transect studies. The current εapp dataset, however, exhibit a bias toward mid-latitude regions of the Northern Hemisphere. Here we report the results of an outdoor experiment wherein four evergreen and three deciduous species were grown with water of known δD value (-1.8‰) in a tropical semi-arid monsoon region. This allowed us to estimate εapp more accurately and also quantify εapp variability within a species and among different species. Among-species εapp values varied by -119 ± 23‰ (for n-alkane of chain length n-C31) and -126 ± 27‰   (for n-alkanoic acid of chain length n-C30). The similarity of the among-species variability in εapp reported here and that observed in field and transect studies suggested the species-effect, rather than uncertainty in δD of source water, control the uncertainty in community-averaged εapp. The fractionation of  δD between n-C29 alkane and n-C30 alkanoic acid (ε29/30) and between n-C31 alkane and n-C32 alkanoic acid (ε31/32) were 7 ± 25‰ and 6 ± 15‰, respectively, suggesting minimal fractionation of hydrogen isotopes during decarboxylation. Further, as we did not observe a systematic difference between the εapp of deciduous and evergreen species; changes in the relative proportion of this vegetation in a community might not affect its εapp value.

How to cite: Saishree, A., Managave, S., and Sarangi, V.: Hydrogen isotope fractionation between leaf wax compounds and source water in tropical angiosperms, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16471, https://doi.org/10.5194/egusphere-egu23-16471, 2023.

EGU23-16632 | ECS | Orals | GI2.1

Development of a flexible 2D DC Resistivity modelling technique for use in space domain 

Deepak Suryavanshi and Rahul Dehiya

Geoelectric non-destructive imaging and monitoring of the earth's subsurface requires robust and adaptable numerical methods to solve the governing differential equation. Most of the time, the DC data is acquired along a straight line. Hence, we solve the DC problem for the 2D case. But the source for the DC method exhibits a 3D nature. To account for the source's 3D nature, the 2D DC resistivity modeling is often carried out in the wavenumber domain. There have been studies that suggest ways for the selection of optimum wavenumbers and weights. But, this does not guarantee a universal choice of wavenumbers. The chosen wavenumbers and related weights strongly influence the precision of the resulting solution in the space domain. Many forward modeling studies demonstrate that selecting effective wavenumbers is challenging, especially for complicated models with topography, anisotropy, and significant resistivity differences. Moreover, forward modeling requires many wavenumbers as the models get more complex. 

This study focuses on developing a method that can completely omit wavenumbers for 2D DC resistivity modeling. The present work finds its motivation in a numerical experiment on a simple half-space model. Since the analytical response for such a model can be easily calculated, we match the analytical solution against the responses obtained from various wavenumbers and weights used in the literature. All the responses deviated from the analytical solution after a certain distance, and none of them were found to be accurate for large offsets. It was discovered after thorough testing of the numerical scheme that the wavenumbers selected for the forward modeling significantly impacted how practical the approach is for large offsets. 

To overcome this problem, a new boundary condition is derived and implemented in the existing numerical scheme. The numerical scheme chosen to perform the forward modeling is Mimetic Finite Difference Method (MFDM). We consider that the source is placed on the origin of the coordinate system. This removes the dependency of the source term, expressed in the Fourier domain, on the wavenumber. The solution obtained by solving the resulting equation will be an even function of the wavenumber and be real-valued. This ensures that the potential in the space domain for the 2D model will also be a real-valued even function with a symmetry about a plane perpendicular to the strike direction and passing through the origin. Because the first-order derivative of an even function at the plane of symmetry vanishes, mathematically, it can be expressed as a Neumann boundary condition at the considered plane. Therefore, we propose a scheme to solve the 2D resistivity problem in the space domain using the boundary condition mentioned here.

The developed algorithm is tested on isotropic and anisotropic two-layer models with large contrasts. It is found that the numerical solutions obtained using the modified boundary condition described above show considerable accuracy even for large offsets when compared with the analytical solution. On the other hand, the results obtained using available wavenumbers in the literature are also compared and are found to deviate considerably from the analytical solution at large offsets.

How to cite: Suryavanshi, D. and Dehiya, R.: Development of a flexible 2D DC Resistivity modelling technique for use in space domain, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16632, https://doi.org/10.5194/egusphere-egu23-16632, 2023.

Approximately eight years ago, after a research activity that I started in the nineties on the application of GPR and, later, of NDTs to civil engineering, I realized that no technology can be considered as self standing. This is the consequence of the high complexity related to the civil engineering works and to the highly unpredictable impacts of ordinary processes and exceptional natural events. At the beginning of this century it was clear that a reliable and comprehensive monitoring of a phenomenon affecting bridges, tunnels, structures, or any civil engineering work is possible only by integrating data from different sources.

GPR was at that time a very promising technology, and many investigated in this field measuring e.g. pavement deformation, asphalt moisture, ballast degradation, also the mechanical properties of materials. The accurate outcomes represent a great step forward for the science in this sector, but the final results demonstrated to be partial, because the approach failed under a holistic perspective.

So, in the second decade of 2000, the need of a novel paradigm for investigation raised, in order not only to identify and quantify the problem, but also to diagnose its causes.

It was the stimulus to fuse data from different NDTs, under the assumption that information A and B give much more than A+B. It means that one information (A) can be explanatory of one or more characters contained in a second (B) that cannot be inferred by the knowledge of only one single standing information (B).

Based on this I decided, with very high level international colleagues, to establish a new session at EGU. It was the 2018. Today the sixth edition!

During these years a number ranging from 80 to 120 of researchers took part to each session. Also the number of countries involved is impressive, ranging for each session from 10 to 17. The institutions ranged from 36 to 50.

The number of contributions presented in the five editions is 141.

After 2018 we have seen several special issues of prominent journals were dedicated to data fusion. Recently, beyond the typical technologies as GPR, UT, ERT, a great attention was given to Lidar, Satellite and UAV.

Data fusion was also directed to other interesting and promising fields as archaeology, agriculture, urban planning, only to cite a few.

I would like to underline that this great interest started in Europe and in USA, but actually the geographical coverage is much wider and it includes at a same level also Asiatic and emerging countries.

There is now a new frontier that has to be. My vision is that this holistic approach can be used to develop an innovative immersive environment through the integration in augmented reality platforms on which a digital twin can be generated and dynamically upgraded through an adaptive interface, as well as using AI and machine learning paradigms.

How to cite: Benedetto, A.: Data fusion in civil engineering: personal experience, vision and historical considerations, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16864, https://doi.org/10.5194/egusphere-egu23-16864, 2023.

Building Information Modeling is a software-based parametric design approach that allows a full interoperability between the various actors involved in a design or management process. Notwithstanding It has been specifically created for buildings projects, its use has been adapted to a wide range of applications, including transport infrastructure design and, more recently, cultural heritage. In regard to this field, it has been mainly applied to raise accuracy and effectiveness of restoring and stabilization activities for historical architectures.
The present study aims at demonstrating how the use of BIM may return remarkable outcomes in improving the current quality level of digital valorisation and virtual reconstructions of historical structures, especially when their rate of conservation is limited. Indeed, even though current digital reconstruction models are, usually, verified under an archaeological perspective, their structural consistency is never tested. This involves that many virtual reconstruction models are likely to represent structures that are historically accurate but that have no structural sense as, according to their geometric features and the construction materials/techniques, they would not stand their weight.
In this perspective, this study proposes a novel BIM-based methodology capable of both driving the archaeological reconstruction hypotheses and testing the reconstruction hypotheses on a structural basis. The model can be schematically represented by the following process:
1- Survey of the emerging: acquisition of data from superficial archaeological surveys (topographic data, laser scanner, aero photogrammetry, satellite images)
2- Survey of the hidden: acquisition of data from hypogeal surveys (georadar, electrical tomography, magnetometry);
3- Mechanical characterization: gathering of information concerning the materials of the find, proven in their mechanic qualities also through load stress tests;
4- Virtual reconstruction: proposal of a possible hypothesis of virtual reconstruction linked to structural and morphological features known to be present in the referred historical periods;
5- Structural test: engineering and structural confirmation of the forwarded hypothesis by means of finite element algorithms.
The proposed methodology was tested on the archaeological area of the Villa and Circus of Maxentius along the Ancient Appian Way in Rome; all the planned activities have been shared and authorized by the Sovrintendenza Capitolina ai Beni Culturali, within the context of the Project BIMHERIT, funded by Regione Lazio (DTC Lazio Call, Prot. 305-2020-35609).

How to cite: Santarelli, R. and Ten, A.: Integration of non-destructive surveys for BIM-based and structural-verified digital reconstruction of archaeological sites, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17489, https://doi.org/10.5194/egusphere-egu23-17489, 2023.

In the wake of Chernobyl and Fukushima accidents radiocesium has become a radionuclide of most environmental concern. The ease with which this radionuclide moves through the environment and is taken up by plants and animals is governed by its chemical forms and site-specific environmental characteristics. Distinctions in climate and geomorphology, as well as 137Cs speciation in the fallout result in differences in migration rates of 137Cs in the environment and rates of its natural attenuation. In Fukushima areas 137Cs was found to be strongly bound to soil and sediment particles, its bioavailability being reduced as a result.  Up to 80% of the deposited 137Cs on the soil were reported to be incorporated in hot glassy particles (CsMPs) insoluble in water. Disintegration of these particles in the environment is much slower than of Chernobyl-derived fuel particles. The higher annual precipitation and steep slopes in Fukushima contaminated areas are conducive to higher erosion and higher total radiocesium wash-off. Typhoons Etou in 2015 and Hagibis in 2019 demonstrated the pronounced redistribution of 137Cs on river watersheds and floodplains, and in some cases natural self-decontamination occurred. Among the common features in 137Cs behavior in Chernobyl and Fukushima is a slow decrease in 137Cs activity concentration in small, closed, and semi-closed lakes and its particular seasonal variations: increase in summer and decrease in winter.

How to cite: Konoplev, A.: Fukushima and Chernobyl: similarities and differences of radiocesium behavior in the soil-water environment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1081, https://doi.org/10.5194/egusphere-egu23-1081, 2023.

After the Fukushima nuclear accident, atmospheric 134Cs and 137Cs measurements were taken in Fukushima city for 8 years, from March 2011 to March 2019. The airborne surface concentrations and deposition of radiocesium (radio-Cs) were high in winter and low in summer; these trends are the opposite of those observed in a contaminated forest area. The effective half-lives of 137Cs in the concentrations and deposition before 2015 (0.754 and 1.30 years, respectively) were significantly shorter than those after 2015 (2.07 and 4.69 years, respectively), which was likely because the dissolved radio-Cs was discharged from the local terrestrial ecosystems more rapidly than the particulate radio-Cs. In fact, the dissolved fractions of precipitation were larger than the particulate fractions before 2015, but the particulate fractions were larger after 2016. X-ray fluorescence analysis suggested that biotite may have played a key role in the environmental behavior of particulate forms of radio-Cs after 2014. 

Resuspension of 137Cs from the contaminated ground surface to the atmosphere is essential for understanding the long-term environmental behaviors of 137Cs. We assessed the 137Cs resuspension flux from bare soil and forest ecosystems in eastern Japan in 2013 using a numerical simulation constrained by surface air concentration and deposition measurements. In the estimation, the total areal annual resuspension of 137Cs is 25.7 TBq, which is equivalent to 0.96% of the initial deposition (2.68 PBq). The current simulation underestimated the 137Cs deposition in Fukushima city in winter by more than an order of magnitude, indicating the presence of additional resuspension sources. The site of Fukushima city is surrounded by major roads. Heavy traffic on wet and muddy roads after snow removal operations could generate superlarge (approximately 100 μm in diameter) road dust or road salt particles, which are not included in the model but might contribute to the observed 137Cs at the site.

The current presentation based on the two published papers: Watanabe et al., ACP, https://doi.org/10.5194/acp-22-675-2022 (2022) and Kajino et al., ACP, https://doi.org/10.5194/acp-22-783-2022 (2022). The presenters would like to thank all of the co-authors of the two papers for their significant contributions.

How to cite: Kajino, M. and Watanabe, A.: Eight-year variations in atmospheric radiocesium in Fukushima city and simulated resuspension from contaminated ground surfaces in eastern Japan, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1607, https://doi.org/10.5194/egusphere-egu23-1607, 2023.

EGU23-2540 | Posters on site | GI2.2

Hydrological setting control 137Cs and 90Sr concentration at headwater catchments in the Chornobyl Exclusion Zone 

Yasunori Igarashi, Yuichi Onda, Koki Matsushita, Hikaru Sato, Yoshifumi Wakiyama, Hlib Lisovyi, Gennady Laptev, Dmitry Samoilov, Serhii Kirieiev, and Alexei Konoplev

Concentration-discharge relationships are widely used to understand the hydrologic processes controlling river water chemistry. We investigated how hydrological processes affect radionuclide concentrations (137Cs and 90Sr) in surface water in the headwater catchment at the Chornobyl exclusion zone in Ukraine. In flat wetland catchment, the depth of saturated soil layer changed little throughout the year, but changes in saturated soil surface area during snowmelt and immediately after rainfall affected water chemistry by changing the opportunities for contact between suface water and the soil surface. On the other hand, slope catchments with little wetlands, the water chemistry in river water is formed by changes in the contribution of "shallow water" and "deep water" due to changes in the water pathways supplied to the river. Dissolved and suspended 137Cs concentrations did not correlate with discharge rate or competitive cations, but the solid/liquid ratio of 137Cs showed a significant negative relationship with water temperature, and further studies are needed in terms of sorption/desorption reactions. 90Sr concentrations in surface water were strongly related to water pathways for each the catchments. The contact between surface water and the soil surface and the change in the contribution of shallow and deep water to stream water could changes 90Sr concentrations in surface water for in wetland and slope catchments, respectively. In this study, we revealed that the radionuclide concentrations in rivers in Chornobyl is strongly affected by the water pathways at headwater catchments.

How to cite: Igarashi, Y., Onda, Y., Matsushita, K., Sato, H., Wakiyama, Y., Lisovyi, H., Laptev, G., Samoilov, D., Kirieiev, S., and Konoplev, A.: Hydrological setting control 137Cs and 90Sr concentration at headwater catchments in the Chornobyl Exclusion Zone, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2540, https://doi.org/10.5194/egusphere-egu23-2540, 2023.

EGU23-2561 | Posters on site | GI2.2

Dispersion of particle-reactive elements caused by the phase transitions in scavenging 

Kyeong Ok Kim, Vladimir Maderich, Igor Brovchenko, Kyung Tae Jung, Sergey Kivva, Katherine Kovalets, and Haejin Kim

A generalized model of scavenging of the reactive radionuclide 239,240Pu was developed, in which the sorption-desorption processes of oxidized and reduced forms on multifraction suspended particulate matter are described by first-order kinetics. One-dimensional transport-diffusion-reaction equations were solved analytically and numerically. In the idealized case of instantaneous release of 239,240Pu on the ocean surface, the profile of concentrations asymptotically tends to the symmetric spreading bulge in the form of a Gaussian moving downward with constant velocity. The corresponding diffusion coefficient is the sum of the physical diffusivity and the apparent diffusivity caused by the reversible phase transitions between the dissolved and particulate states. Using the method of moments, we analytically obtained formulas for both the velocity of the center mass and apparent diffusivity. It was found that in ocean waters that have oxygen present at great depths, we can consider in the first approximation a simplified problem for a mixture of forms with a single effective distribution coefficient, as opposed to considering the complete problem. This conclusion was confirmed by the modeling results for the well-ventilated Eastern Mediterranean. In agreement with the measurements, the calculations demonstrate the presence of a maximum that is slowly descending for all forms of concentration. The ratio of the reduced form to the oxidized form was approximately 0.22-0.24. At the same time, 239,240Pu scavenging calculations for the anoxic Black Sea deep water reproduced the transition from the oxidized to reduced form of 239,240Pu with depth in accordance with the measurement data.

How to cite: Kim, K. O., Maderich, ., Brovchenko, ., Jung, . T., Kivva, ., Kovalets, ., and Kim, .: Dispersion of particle-reactive elements caused by the phase transitions in scavenging, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2561, https://doi.org/10.5194/egusphere-egu23-2561, 2023.

EGU23-3049 | ECS | Posters on site | GI2.2

Changes in Air Dose Rates due to Soil Water Content in Forests in Fukushima Prefecture, Japan 

Miyu Nakanishi, Yuichi Onda, Hiroaki Kato, Junko Takahashi, Hikaru Iida, and Momo Takada

Radionuclides released and deposited by the 2011 Fukushima Daiichi Nuclear Power Plant accident caused an increase in air dose rates in forests in Fukushima Prefecture. It has been reported that air dose rates increase during rainfall, but we found that air dose rates decreased during rainfall in forests in Fukushima. This is said to be due to the shielding effect of soil moisture. This study aimed to develop a method for estimating changes in air dose rates due to rainfall even in the absence of soil moisture data. Therefore, we used the preceding rainfall (Rw), an indicator that also takes into account past rainfall; we calculated Rw in Namie-Town, Futaba-gun, Fukushima Prefecture from May to July 2020, and estimated air dose rates. In this area, air dose rates decreased with increasing soil moisture. Furthermore, air dose rates could be estimated by combining Rw with a half-life of 2 hours and 7 days, and by considering hysteresis in the absorption and drainage processes. The coefficient of determination (R2) exceeded 0.70 for the estimation of soil water content at this time. Furthermore, good agreement was also observed in the estimation of air dose rates from Rw (R2 > 0.65). The same method was used to estimate air dose rates at the Kawauchi site from May to July 2019. Due to the high water repellency of the Kawauchi site, the increase in soil water content was very small and the change in air dose rate was almost negligible when soil water content was less than 15% and rainfall was less than 10 mm. This study enabled the estimation of soil water content and air dose rate from rainfall and captured the effect of rainfall on the decreasing trend of air dose rate. Therefore, in the future, This study can be used as an indicator to determine whether temporary changes in air dose rates are caused by influences other than rainfall. This study also contributes to the improvement of methods for estimating external dose rates for humans and terrestrial animals and plants in forests.

How to cite: Nakanishi, M., Onda, Y., Kato, H., Takahashi, J., Iida, H., and Takada, M.: Changes in Air Dose Rates due to Soil Water Content in Forests in Fukushima Prefecture, Japan, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3049, https://doi.org/10.5194/egusphere-egu23-3049, 2023.

Wet scavenging modeling remains a challenge of the atmospheric transport of 137Cs following the Fukushima Daiichi Nuclear Power Plant accident, which significantly influences the detailed spatiotemporal 137Cs distribution. Till now, numerous wet deposition schemes have been proposed for 137Cs, but it is often difficult to evaluate them consistently, due to the limited resolution of meteorological field data and detailed differences in model implementations. This study evaluated the detailed behavior of 25 combinations of in- and below-cloud wet scavenging models in the framework of the Weather Research and Forecasting-Chemistry model, using high-resolution (1 km × 1 km) meteorological input. The above implementation enables consistent evaluation with great details, revealing complex local behaviors of these combinations. The 1-km-resolution simulations were compared with simulations obtained previously using 3-km-resolution meteorological field data, with respect to the rainfall pattern of the east Japan during the accident, atmospheric concentrations acquired at the regional SPM monitoring sites and the total ground deposition. The capability of these models in reproducing local-scale observations were also investigated with a local-scale observations at the Naraha site, which his only 17.5 km from the Fukushima Daiichi Nuclear Power Plant. The performance of the ensemble mean was also evaluated. Results revealed that the 1-km simulations better reproduce the cumulative rainfall pattern during the Fukushima accident than those revealed by the 3-km simulations, but showing with spatiotemporal variability in accuracy. And rainfall below 1 mm/h is critical for the simulation accuracy. Those single-parameter wet deposition models that rely solely on the rainfall showed improvements in performance in the 1-km simulations relative to that in the 3-km simulations, because of the improved rainfall simulation in the 1-km results. Those multiparameter models that rely on both cloud and rainfall showed more robust performance in both the 3-km and -1km simulations, and the Roselle–Mircea model presented the best performance among the 25 models considered. Besides rainfall, wind transport showed substantial influence on the removal process of atmospheric 137Cs, and it was nonnegligible even during periods in which wet deposition was dominant. The ensemble mean of the 1-km simulations better reproduces the high deposition area and the total deposition amount is closer to the observations than the 3-km simulation. At the local scale, the 1-km-resolution simulations effectively reproduced the 137Cs concentrations observed at the Naraha site, but with deviations in peak timing, mainly because of biased wind direction. These findings indicate the necessity of a multi-parameter model for robust regional-scale wet deposition simulation and a refined wind and dispersion model for local-scale simulation of 137Cs concentration.

How to cite: Zhuang, S., Dong, X., Xu, Y., and Fang, S.: Modeling and sensitivity study of wet scavenging models for the Fukushima accident using 1-km-resolution meteorological field data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4152, https://doi.org/10.5194/egusphere-egu23-4152, 2023.

EGU23-4697 | ECS | Orals | GI2.2

Quantifying the riverine sources of sediment and associated radiocaesium deposited off the coast of Fukushima Prefecture 

Pierre-Alexis Chaboche, Wakiyama Yoshifumi, Hyoe Takata, Toshihiro Wada, Olivier Evrard, Toshiharu Misonou, Takehiko Shiribiki, and Hironori Funaki

The Fukushima-Daiichi Nuclear Power Plant (FDNPP) accident trigged by the Great East Japan Earthquake and subsequent tsunami in March 2011 released large quantities of radionuclides in terrestrial and marine environments of Fukushima Prefecture. Although radiocaesium (i.e. 134Cs and 137Cs) activity in these environments has decreased since the accident, the secondary inputs via the rivers draining and eroding the main terrestrial radioactive plume were shown to sustain high levels of 137Cs in riverine and coastal sediments, which are likely deposited off the coast of the Prefecture. Accordingly, identifying the sources of sediment is required to elucidate the links between terrestrial and marine radiocaesium dynamics and to anticipate the fate of persistent radionuclides in the environment.

The objective of this study is to develop an original sediment source tracing technique to quantify the riverine sources of sediment and associated radionuclides accumulated in the Pacific Ocean. Target coastal sediment cores (n=6) with a length comprised between 20 and 60cm depth were collected during cruise campaigns between July and September 2022 at the Ota (n=2), Niida (n=1) and Ukedo (n=3) river mouths. Prior to gamma spectrometry measurements, sediment cores were opened and cut into 2 cm increments, oven-dried at 50°C for at least 48 hours, ground and passed through a 2-mm sieve.

Preliminary results regarding the spatial and depth distribution of radiocaesium in these samples show a strong heterogeneity, with highest radiocaesium levels (up to 134 ± 2 and 4882 ± 11 Bq kg-1 for 134Cs and 137Cs, respectively) found in coastal sediment cores located at the Ukedo river mouth. On the opposite, no trace or low levels of Fukushima-derived radiocaesium were found in the Niida and in one sediment core of the Ota River mouths. Additional measurements will be conducted to determine the physico-chemical properties of this sediment, in order to select the optimal combination of tracers, which will then be introduced into un-mixing models. This increase knowledge will undoubtedly be useful for watershed and coastal management in the FDNPP post-accidental context.

How to cite: Chaboche, P.-A., Yoshifumi, W., Takata, H., Wada, T., Evrard, O., Misonou, T., Shiribiki, T., and Funaki, H.: Quantifying the riverine sources of sediment and associated radiocaesium deposited off the coast of Fukushima Prefecture, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4697, https://doi.org/10.5194/egusphere-egu23-4697, 2023.

EGU23-4925 | Posters on site | GI2.2

Verification of reproductivity of 137Cs activity concentration in the database by an ocean general circulation model 

Daisuke Tsumune, Frank Bryan, Keith Lindsay, Kazuhiro Misumi, Takaki Tsubono, and Michio Aoyama

Radioactive cesium (137Cs) is distributed in the global ocean due to global fallout from atmospheric nuclear tests, release from reprocessing plants in Europe, and supply to the ocean due to the Fukushima Daiichi Nuclear Power Plant accident. In order to detect future contamination by radionuclides, it is necessary to understand the global distribution of radionuclides such as 137Cs. For this purpose, the IAEA is compiling a database of observation results (MARIS). However, since the spatio-temporal densities of observed data vary widely, it is difficult to obtain a complete picture from the database alone. Comparative validation using ocean general circulation model (OGCM) simulations is useful in interpreting these observations, and global ocean general circulation model (CESM2, POP2) simulations were conducted to clarify the behavior of 137Cs in the ocean. The horizontal resolution is 1.125° longitude and 0.28° to 0.54° latitude. The minimum spacing near the sea surface is 10 m, and the spacing increases with depth to a maximum of 250 m with 60 vertical levels. Climatic values were used for driving force. As a source term for 137Cs to the ocean, atmospheric fallout from atmospheric nuclear tests was newly established based on rainfall data and other data, and was confirmed to be more reproducible than before. Furthermore, the release from reprocessing plants in Europe and the leakage due to the accident at the Fukushima Daiichi Nuclear Power Plant were taken into account. 2020 input conditions were assumed to continue after 2020, and calculations were performed from 1945 to 2030. The simulated 137Cs activities were found to be in good agreement, especially in the Atlantic and Pacific Oceans, where the observed densities are large. On the other hand, they were underestimated in the Southern Hemisphere, suggesting the need for further improvement of the fallout data. 137Cs concentrations from the Fukushima Daiichi Nuclear Power Plant accident in March 2011 were generally in good agreement, although the reproducibility remained somewhat problematic due to insufficient model resolution. In other basins, the concentration characteristics were able to be determined, although the observed values were insufficient. Radioactivity concentrations of atmospheric nuclear test-derived 137Cs may continue to be detected in the global ocean after 2030. The results of this simulation are useful for planning future observations to fill the gaps in the database.

How to cite: Tsumune, D., Bryan, F., Lindsay, K., Misumi, K., Tsubono, T., and Aoyama, M.: Verification of reproductivity of 137Cs activity concentration in the database by an ocean general circulation model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4925, https://doi.org/10.5194/egusphere-egu23-4925, 2023.

EGU23-4947 | ECS | Posters on site | GI2.2

Vertical distribution of radioactive cesium-rich microparticles in forest soil of Hamadori area, Fukushima Prefecture 

Takahiro Tatsuno, Hiromichi Waki, Naoto Nihei, and Nobuhito Ohte

A lot of radionuclides were scattered after the Fukushima Daiichi Nuclear Power Plant (FDNPP) accident. Previous studies showed that there were FDNPP-derived radioactive cesium-rich microparticles (CsMPs) with the size of a few μm in the soil and river water around Fukushima Prefecture[1]. CsMPs have high radioactive cesium (Cs) concentration per unit mass, therefore they can be one of the factor in overestimating the Cs concentration in samples. Because Cs in CsMPs may not react directly with clay particles unlike the Cs ion in liquid phase, it is considered that CsMPs work as Cs carrier in soils[2]. However, unlike ionic Cs and Cs adsorbed onto clay particles, the distribution and dynamics of CsMPs in soils have not been clarified. In this study, we investigated vertical distribution of CsMPs in the forest soil and the soil properties in Fukushima Prefecture, Japan.

Soil samples were collected from the forest in the difficult-to-return zone, approximately 10 km away from the FDNPP. The undisturbed soil samples were collected from 0-35 cm soil depth at 5 cm intervals using core sampler to investigate soil properties. Furthermore, litter samples on the surface soil layer were collected. Using these samples, the vertical distribution of Cs concentration in the soil and Cs derived from CsMPs were investigated. Cs concentration in samples placed in 100 mL of U8 container was measured using a germanium semiconductor detector. Cs derived from CsMPs was evaluated using an Imaging plate with reference to the method ffor quantification of CsMPs[3].

Like Cs adsorbed on the soil, CsMPs were also mostly distributed in the soil surface layer between o and 5 cm of soil depth. We considered that straining may be one of the mechanism of CsMPs retention on the soil surface. Bradford et al. (2006) [4] showed that straining might be a significant mechanism for colloid retention when the average particle size in the porous medium is less than 200 times larger than the colloidal particle size. In this study, assuming the CsMPs size of approximately 1 µm, the average particle size of the soil collected from surface layer 0-5 cm was less than 200 times that of CsMPs. However, the average particle size decreased in deeper layer than 5 cm, therefore, it was considered that straining mechanism could be stronger.

This work was supported by FY2022 Sumitomo Foundation and FY2022 Internal Project of Institute of Environmental Radioactivity, Fukushima University.

 

References

[1] Igarashi, Y. et al., 2019. J. Environ. Radioact. 205–206, 101–118.

[2]  Tatsuno, T et al., 2022. J. Environ. Manage. 329, 116983.

[3] Ikehara et al., 2018. Environ. Sci. Technol. 52, 6390–6398.

[4] Bradford et al., 2003. Environ. Sci. Technol. 37, 2242–2250.

How to cite: Tatsuno, T., Waki, H., Nihei, N., and Ohte, N.: Vertical distribution of radioactive cesium-rich microparticles in forest soil of Hamadori area, Fukushima Prefecture, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-4947, https://doi.org/10.5194/egusphere-egu23-4947, 2023.

EGU23-5042 | ECS | Posters on site | GI2.2

Changes in 90Sr transport dynamics in groundwater after large-scale groundwater drawdown in the vicinity of the cooling pond at the Chornobyl Nuclear Power Plant 

Hikaru Sato, Naoaki Shibasaki, Maksym Gusyev, Yuichi Onda, and Dmytro Veremenko

Migration of long-lived radioactive 90Sr introduced by nuclear accidents and radioactive waste requires long-term monitoring and protection management due to its half-life of 28.8 years and high mobility in water. Presently, 37 years have passed since the largest worldwide 90Sr contamination was released and deposited around the Chornobyl Nuclear Power Plant (ChNPP). In the vicinity of the ChNPP, the water level of the cooling pond (CP) has declined since May 2014 following the decommissioning phase of the Unit 3 reactor. The drawdown of the CP lowered the groundwater level in a massive vicinity (about 70 km2), and the change in the groundwater system due to the drawdown has caused concerns about possible changes in 90Sr concentrations in water and transport dynamics to the Pripyat River. Therefore, this study evaluated how 90Sr transport dynamics were influenced due to changes in the groundwater flow system from 2011 to 2020 based on observed data and results of the groundwater flow simulation in the CP vicinity.

The numerical simulation was conducted from 2011 to 2020 on monthly time-step using USGS MODFLOW with PM11 GUI and calibrated to groundwater heads measured at monitoring wells. In the location between the CP and the Pripyat River, estimated pore velocities near the river were reduced compared to velocities before the CP drawdown due to the decrease in the hydraulic gradient between the CP and the river. Decrease in groundwater velocity results decrease in groundwater discharge and delay of 90Sr transport. Therefore, the amount of 90Sr transported from the CP to the river is smaller than the period prior to the CP drawdown. The reduced 90Sr transport is expected to have less impact on the radioactivity in the river water even in the Pripyat River floodplain northwest of the CP where 90Sr concentrations significantly increased after the CP drawdown. In addition, the measured and simulated changes in groundwater flow direction and velocity suggested the possibility of 90Sr accumulation at the floodplain caused by stagnant groundwater from reduced velocity and additional 90Sr infiltration from surrounding ponds located at the Pripyat River floodplain. Therefore, enhancing the current monitoring of 90Sr concentrations near the floodplain would be needed for long-term monitoring and protection management to prevent the risk.

How to cite: Sato, H., Shibasaki, N., Gusyev, M., Onda, Y., and Veremenko, D.: Changes in 90Sr transport dynamics in groundwater after large-scale groundwater drawdown in the vicinity of the cooling pond at the Chornobyl Nuclear Power Plant, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5042, https://doi.org/10.5194/egusphere-egu23-5042, 2023.

The 3D model THREETOX was applied for the long-term simulation of the planned release of radioactively contaminated water from Fukushima storage tanks to marine environment. Two radionuclides were considered: 3H that has the largest activity in tanks and 129I that can caused the largest dose of radiation to human. The constant release rate of 3H equal to 22 TBq/y according to TEPCO estimations and the constant release rate of 129I equal to 361 MBq/y according to estimations from the current study were used in the simulations.

The THREETOX model used monthly averaged currents from the KIOST-MOM model. A dynamic food web model was included in the THREETOX model. In the model, organisms uptake the activity directly from water and through the food chain. The food chain consists of phytoplankton, zooplankton, non-piscivorous (prey) fish, and piscivorous (predatory) fish. In case of 129I, macro-algae was also considered. The modelling area covers Fukushima coastal waters and extends for 1600 km from the coast to the East. From North to South this area extends for 1300 km.

From model results, we can see how contamination will spread along the coast in different seasons. For example, in summer time the currents near the coast are directed to the North that leads to contamination of the Sendai Bay. This means that at different points along the coast, the concentration of radionuclides can periodically change according to currents that change during the year. Calculated concentrations of activity at several points along the coast of Japan, which correspond to largest cities in the area of interest, were extracted from model results. For example, calculated concentration of 3H in water in Tomioka point, which is quite close to FDNPP, sometimes can exceed 200 Bq/m3. In Soma point, the concentration will exceed 50 Bq/m3, while in point Iwaki-Onahama – 20 Bq/m3 at some moments of time. In other points, the calculated concentration of 3H in water will not exceed 10 Bq/m3 that is less than background concentration 50 Bq/m3. Concerning 129I, its maximum concentration in water will be around 10-3 – 10-2 Bq/m3 in points close to FDNPP and around 10-4 Bq/m3 in points further from the NPP that is around 100 000 times less than the calculated concentrations of 3H.

Calculated concentrations of OBT (organically bounded tritium) in predatory and prey fish are less than 0.01 Bq/kg in all points except FDNPP point where it is around 0.02 Bq/kg. This value is 10 times less than measured concentration of OBT in fish (0.2 Bq/kg) that was made in 2014 in the coastal area near the damaged NPP. Calculated concentrations of 129I in predatory and prey fish are in the range 10-6 – 10-4 Bq/kg in all considered points. Concentrations of 129I in macro-algae are about 100 times higher due to ability of iodine to accumulate in macro-algae. 

How to cite: Bezhenar, R., Takata, H., and Maderich, V.: Transport of H-3 and I-129 in water and their uptake by marine organisms due to the planned release of Fukushima storage water, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6019, https://doi.org/10.5194/egusphere-egu23-6019, 2023.

EGU23-6026 | Orals | GI2.2

Dynamic change of dissolved Cs-137 from headwaters to downstream in the Kuchibuto River catchment 

Yuichi Onda, Taichi Kawano, Keisuke Taniguchi, and Junko Takahashi

The Fukushima Daiichi Nuclear Power Plant (FDNPP) accident on March 11, 2011 resulted in the release of large amounts of radioactive cesium-137 (137Cs) into the environment. It is important to characterize the Cs-137 dynamics throughout the river from the headwaters to the downstream. Previous studies have suggested the importance of dissolved forms of Cs-137 in organic matter in small watersheds and dissolved forms in suspended solids in large watersheds. Since the concentration of suspended-form Cs has been shown to decrease significantly after decontamination in evacuated areas (Feng et al. 2022), this rapid decrease in suspended-form Cs-137 concentration can be used to determine the cause of dissolved-form Cs. Therefore, we attempted to evaluate whether the dissolved Cs-137 was derived from organic matter or suspended solids by comparing data before and after decontamination.

 The objective of this study is to compare the decreasing trends of Cs-137 concentrations in decontaminated and undecontaminated areas based on long-term monitoring of suspended solids, dissolved solids, and coarse organic matter Cs-137 concentrations since 2011. The study area includes four headwater basins and four river basins (eight sites in total) in the Kuchibuto River watershed in the Yamakiya district of Fukushima Prefecture, located approximately 35 km northwest of the FDNPP.

In the Kuchibuto River watershed, a large inflow of decontaminated soil with low Cs-137 concentrations due to an increase in the amount of bare land caused by decontamination resulted in a rapid decrease in the concentration of suspended-form 137Cs in the decontaminated area in the headwaters and in the upper reaches of the river. However, no clear effect of decontamination was observed in the concentrations of dissolved Cs-137 and Cs-137 in coarse organic matter. Comparison of the slopes of Cs-137 concentrations in the suspended, dissolved, and coarse organic matter showed that the slope of the dissolved form was similar to that of the coarse organic matter in the source watersheds, and similar to that of the SS in the downstream watersheds. These results suggest that the contribution of dissolved Cs-137 from organic matter in small watersheds and that from suspended solids in large watersheds is significant.

How to cite: Onda, Y., Kawano, T., Taniguchi, K., and Takahashi, J.: Dynamic change of dissolved Cs-137 from headwaters to downstream in the Kuchibuto River catchment, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6026, https://doi.org/10.5194/egusphere-egu23-6026, 2023.

EGU23-10093 | Posters on site | GI2.2

Riverine 137Cs dynamics and remoralization in coastal waters during high flow events 

Yoshifumi Wakiyama, Hyoe Takata, Keisuke Taniguchi, Takuya Niida, Yasunori Igarashi, and Alexei Konoplev

Understanding riverine 137Cs dynamics during high-flow events is crucial for improving predictability of 137Cs transportation and relevant hydrological responses. It is frequently documented that the majority of 137Cs is exported during high-flow events triggered by intensive rainfall. Studies on 137Cs in coastal seawater suggested that a huge high-flow events resulted in high dissolved 137Cs concentration in seawater. Different temporal patterns of 137Cs concentrations in river water are found in the existing literature on 137Cs dynamics during high-flow events. Although such differences may reflect catchment characteristics, there is no comprehensive analysis for the relationships. This study explores catchment characteristics affecting 137Cs transport via river to ocean based on datasets obtained by sampling campaigns during high-flow events. 137Cs datasets obtained at 13 points in 6 river water systems were subject to the analysis. The analyses intended to explore relationship between catchment characteristics (scale and land use composition) and 137Cs dynamics in terms of variations in concentration, fluxes, and potential remobilization in seawater. We could not find any significant correlations between the parameters of catchment characteristics and mean values of normalized concentrations of 137Cs and apparent Kd. However, when approximating 137Cs concentrations and Kd value as a power function of suspended solid concentration (Y=α X^β), the power of β in the equations for dissolved 137Cs concentration and Kd showed negative and positive correlations with the logarithm of the watershed area, respectively, and the positive β was found when the catchment area was on the order of 100 km2 or larger and vice versa. This indicates that the concentration of dissolved 137Cs tends to decrease with increased water discharge in larger catchments for smaller catchments. These results suggest that the temporal pattern of dissolved 137Cs concentrations depends on watershed scale. 137Cs flux during a single event ranged from 1.9 GBq to 1.1 TBq and accounted for 0.00074% to 0.22% of total 137Cs deposited in relevant catchments. Particulate 137Cs flux accounted for more than 92% of total 137Cs flux, except for Ukedo River basin with a large dam reservoir. R-factor, an erosivity index in the Universal Soil Loss Equation model family, is a good parameter for reproducing sediment discharge and particulate 137Cs flux. Efficiency of particulate 137Cs flux, calculated by dividing the flux by R-factor of event, tended to be high in catchments with relatively low forest cover. Desorption ratio of 137Cs, obtained by 1-day shaking experiment of SS in seawater, ranged from 2.8 to 6.6%. The ratio was almost proportional of ratio of exchangeable 137Cs. The estimated amounts of desorbed 137Cs, obtained by multiplying particulate 137Cs and the desorption ratios, were greater than direct flux of dissolved 137Cs. Reanalysis of riverine 137Cs dataset in high flow events is revealing relationship between catchment characteristics and 137Cs dynamics. Further analyses, such as evaluation of decontamination impacts and inter-catchment comparisons of 137Cs fluxes, are required for better understanding.

How to cite: Wakiyama, Y., Takata, H., Taniguchi, K., Niida, T., Igarashi, Y., and Konoplev, A.: Riverine 137Cs dynamics and remoralization in coastal waters during high flow events, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10093, https://doi.org/10.5194/egusphere-egu23-10093, 2023.

EGU23-10539 | Posters on site | GI2.2 | Highlight

Long-term dynamics of 137Cs accumulation at an urban pond 

Honoka Kurosawa, Kenji Nanba, Toshihiro Wada, and Yoshifumi Wakiyama

It is known that the semi-enclosed water area such as pond and dam reservoir is readily subject to 137Cs accumulation because of the secondary inflow from the catchment area. We present the long-term monitoring data of the 137Cs concentration in bottom sediment and pond water in an urban pond located in the central area of Koriyama City, Fukushima Prefecture to discuss the 137Cs dynamics of the urban pond. The pond was decontaminated by the bottom sediment removal in 2017. The bottom sediment core and pond water were collected in 2015 and 2018-2021. The inflow and outflow water were collected in 2020-2021. The river water around the pond was collected in 2021. The bottom sediment and water samples were measured for 137Cs concentration, particulate size distribution, and N and C stable isotopes. Compared between 2015 and 2018, the 137Cs inventory and 0-10 cm depth of 137Cs concentration in the bottom sediment at 7 points were decreased by 81 % (mean 1.50 to 0.28 MBq/m2) and 85 % (mean 31.5 to 4.8 kBq/kgDW), respectively. Although mean 137Cs inventory in bottom sediment did not drastically change during 2018-2021, its variability became wider. Points with increased 137Cs inventory in bottom sediment showed year-by-year increase in thickness of layer with concentrations higher than 8 kBq/kgDW, a criterion for considered decontamination. The 137Cs concentration in suspended solids (SS) in pond water was lowered after decontamination, although it still remained above 8 kBq/kgDW. The 137Cs concentrations in SS of inflow water were also high, exceeding 8 kBq/kgDW. The 137Cs concentration in SS of the river water around the pond was higher when it passed through the urban area, suggesting that the inflow of particles from urban origin maintained high 137Cs level in the pond. 

How to cite: Kurosawa, H., Nanba, K., Wada, T., and Wakiyama, Y.: Long-term dynamics of 137Cs accumulation at an urban pond, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10539, https://doi.org/10.5194/egusphere-egu23-10539, 2023.

EGU23-10868 | Posters on site | GI2.2

Estimation of annual Cesium-137 influx from the FDNPP to the coastal water 

Shun Satoh and Hyoe Takata

Due to the accident at the Fukushima Daiichi Nuclear Power Plant (1F) in March 2011, radionuclides were introduced into the environment, and one of the release pathways to the ocean is the direct discharge from the 1F (on-going release). This was mainly caused immediately after the accident, but even now, the on-going release is continuing. In this study, firstly we estimated the on-going release of 137Cs from 1F over 10 years after the accident, using the TEPCO’s 137Cs monitoring results in the coastal area around 1F. Secondly, change in the monitoring data related to countermeasures by TEPCO (e.g. construction of iced walls) to reduce the introduction of contaminated water into the ocean or detect 137Cs in nearby seawater, so their effects on the on-going release estimation were also discussed. A box model including inside and outside of the port was assumed for the area around 1F, and the amount of 137Cs in the box was estimated (estimated value: modeled data). Then, the difference between the estimated value and the amount of 137Cs obtained from actual observed concentrations (measured value: monitoring data) was calculated. The result showed that the measured value was higher than the estimated value, suggesting the on-going release from 1F. As for decrease in monitoring data after the countermeasures, it is implied that the estimation of rate of on-going release has been reduced by the countermeasures.

How to cite: Satoh, S. and Takata, H.: Estimation of annual Cesium-137 influx from the FDNPP to the coastal water, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10868, https://doi.org/10.5194/egusphere-egu23-10868, 2023.

EGU23-11671 | Posters on site | GI2.2

Changes in Cs-137 concentrations in river-bottom sediments and their factors in Fukushima Prefecture rivers 

Naoyuki Wada, Yuichi Onda, Xiang Gao, and Chen Tang

The Fukushima Daiichi Nuclear Power Plant accident (FDNPP) in 2011 resulted in the release of large amounts of Cs-137 into the atmosphere. Cs-137 deposited on land was mainly distributed in forests, but some of it has been discharged to the sea through rivers. The dissolved and suspended forms of Cs-137 in rivers have been focused on, and it is known that the discharge mechanism and concentration formation of Cs-137 differ depending on the land use in the river basin. On the other hand, there are few cases that focus on the dynamics of Cs-137 in river bottom sediments. River-bottom sediment is less likely to flow downstream than suspended sediments, so contamination in the downstream area may be long-term.
We will clarify the migration mechanism of Cs-137 in rivers including river-bottom sediment.Therefore, we will analyze data collected from 2011 to 2018 in 89 watersheds in Fukushima prefecture. In analyzing the data, we removed sampling points with brackish water using electrical conductivity and corrected for particle size to standardize the surface area of particles that absorb Cs-137.As a result, it was found that unlike dissolved and suspended forms, the Cs concentration in river-bottom sediments can increase within the initial year. This is related to the average initial deposition in the watershed and the amount of initial deposition at the river-bottom sediment sampling sites, with a tendency to increase with relatively higher initial deposition in the upstream area. It was also known that the decrease in suspended Cs concentration was more pronounced when anthropogenic activities in the watershed were more active, but there was no clear relationship between land use in the watershed and changes in river-bottom sediment Cs concentration. This indicates that suspended sediment Cs concentrations are controlled by initial deposition to suspended sediment production sources, whereas river-bottom Cs concentrations are controlled by multiple factors such as sediment traction and Cs supply from river water.

How to cite: Wada, N., Onda, Y., Gao, X., and Tang, C.: Changes in Cs-137 concentrations in river-bottom sediments and their factors in Fukushima Prefecture rivers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11671, https://doi.org/10.5194/egusphere-egu23-11671, 2023.

EGU23-12670 | ECS | Orals | GI2.2

Minimizing the loss of radioactively contaminated sediment from the Niida watershed (Fukushima, Japan) through spatially targeted afforestation. 

Floris Abrams, Lieve Sweeck, Johan Camps, Grethell Castillo-Reyes, Bin Feng, Yuichi Onda, and Jos Van Orshoven

Government-led decontamination of agricultural land in the Fukushima accident (2011) region has lowered the on-site radiation risk considerably. From 2013 to early 2017, 11.9% of the land in the Fukushima disaster affected Niida watershed in Japan was remediated through topsoil removal. However, this resulted in a 237.1% increase in suspended sediment loads in the river for 2016 compared to 2013.  In contrast, sediment loads decreased by 41% from 2016 to 2017; this can be attributed to the effect of natural vegetation restoration on sediment yield and transfer patterns (Bin et al., 2022). Since radiocaesium firmly binds to the clay minerals in the soil, it is inevitably transported along with the sediments downstream to the river systems. These observations confirm that rapid, spatially targeted interventions, such as revegetation, e.g., through afforestation, have the potential to decrease the magnitude and period of increased exports of contaminated sediments. The CAMF tool (Cellular Automata-based Heuristic for Minimizing Flow) (Vanegas et al., 2012) was originally designed to find the cells in a raster representation of a watershed for which afforestation would lead to a maximal reduction of sediment exports with minimal effort or cost while taking sediment flow from cell to cell into account. In our research, we adapted the CAMF tool to account for the radiocaesium budgets associated with the transported sediments. We applied the approach to the Niida catchment, where land-cover changes in upstream decontaminated regions are detected using drone imagery and linked to increased sediment loads in the Niida river using long-term river monitoring systems. For example In 2014, agricultural land (18.02 km2) was one of the major land uses in the regions where decontamination was ordered, resulting in increased sediment loads from 2014 to 2016. By recognizing both the on- and off-site impacts of the remediation interventions and their temporal dynamics, the modified CAMF tool offers scope for supporting the formulation of spatio-temporal schemes for the remediation of agricultural land. These schemes aim to decrease the radiation risk for downstream communities and minimize the potential recontamination of already decontaminated sites.

How to cite: Abrams, F., Sweeck, L., Camps, J., Castillo-Reyes, G., Feng, B., Onda, Y., and Van Orshoven, J.: Minimizing the loss of radioactively contaminated sediment from the Niida watershed (Fukushima, Japan) through spatially targeted afforestation., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12670, https://doi.org/10.5194/egusphere-egu23-12670, 2023.

EGU23-13366 | Orals | GI2.2

Similarity of long-term temporal decrease in atmospheric Cs-137 between Chernobyl and Fukushima 

Kentaro Akasaki, Shu Mori, Eiichi Suetomi, and Yuko Hatano

We compare the atmospheric concentrations of Cs-137 after a decade between Chernobyl and Fukushima cases. We plotted 8 datasets on log-log axes (5 cases in Chernobyl and 3 cases Fukushima) and found that they appear to follow a single function.

There have been measured the atmospheric concentration after the Chernobyl accident for more than 30 years [1]. On the other hand, several teams of Japanese researchers have been measured in Fukushima and its vicinity for almost 10 years. [2][3] In this study, we compare 5 sites in Chernobyl (Pripyat, Chernobyl, Baryshevka, Kiev, and Polesskoe) and 3 sites in Fukushima (FDNPP O-6 and O-7, Univ. Fukushima).

We adjust the magnitude of the data because it depends on the amount of the initial deposition. After the adjustment, we plot the 8 cases on a log-log plot. We found that the 8 cases collapse together, with the power index of -1.6. Namely,

C(t) ~ t^{-1.6}.               …(1)

Incidentally, we have been proposed a formula which reproduce the long-term behavior of atmospheric concentration at a fixed location as

C(t) = A exp(-bt) t^{-4/3}    …(2)

where A is a parameter which relates to the amount of the initial deposition and b as the reaction rate of all the first-order reactions (including the radioactive decay rate, the vegetation uptake rate, the runoff rate, etc). We will investigate the difference in the power-law index in Eq. (1) and (2). The parameter b is highly dependent on the environment. When we take a proper value of b, the apparent decrease of the concentration will change from t^{-4/3}. We may make the apparent power-index close to -1.6.

 

[1] E. K. Garger, et al., J. Env. Radioact., 110 (2012) 53-58.

[2] A. Watanabe, et al., Atmos. Chem. Phys. 22 (2022) 675-692.

[3] T. Abe, K. Yoshimura, Y. Sanada, Aerosol and Air Quality Research, 21 (2021) 200636.

How to cite: Akasaki, K., Mori, S., Suetomi, E., and Hatano, Y.: Similarity of long-term temporal decrease in atmospheric Cs-137 between Chernobyl and Fukushima, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13366, https://doi.org/10.5194/egusphere-egu23-13366, 2023.

EGU23-13486 | ECS | Posters virtual | GI2.2

Distributions of tritium in the marine water and biota around Rokkasho Reprocessing Plant 

Satoru Ohtsuki, Yuhei Shirotani, and Hyoe Takata

For decommissioning of Fukushima Daiichi Nuclear Power Station (FDNPS), it is one of the biggest problems to treat the radioactive contaminated stagnant water in the building. It is difficult to remove H-3 from the contaminated water by only Advanced Liquid Processing System (ALPS) treatment. Thus, the Japanese Government announced to release the ALPS treated water containing H-3. To predict the alteration of the dose rate of the marine biota by the change of H-3 concentration in marine water after the release of ALPS water, it is necessary to understand the dynamics of H-3 in marine ecosystem. In this study, we studied the behavior of H-3 in the marine environment (water and biota) off Aomori and Iwate prefectures from FY2003 to FY2012, as the background data of the Pacific Ocean along the coast of the North East Japan. To clarify the dynamics of H-3 in marine biota, we compared H-3 and Cs-137. Excluding the period of the intermittent test operation of the Rokkasho Reprocessing Plant (FY2006-FY2008), the concentration of H-3 in seawater, tissue free water tritium (TFWT) and organically bound tritium (OBT) were 0.052-0.20 Bq/L with a mean of 0.12±0.031 Bq/L, 0.050-0.34 Bq/kg-wet with a mean of 1.1±0.039 Bq/kg-wet and 0.0070-0.099 Bq/kg-wet with a mean of 0.042±0.019 Bq/kg-wet, respectively. Before the FDNPS accident (FY2003-FY2010), Cs-137 concentration in seawater and marine biota were 0.00054-0.0027 Bq/L with a mean of 0.0016±0.00041 Bq/L and 0.022-1.8 Bq/kg-wet with a mean of 0.090±0.037 Bq/kg-wet, respectively. Concentration Ratio (CR), the ratio of the concentration of marine biota and seawater for TFWT, was to be 0.34-2.37 with a mean of 0.97±0.31 in all spices, meaning the concentration of marine biota was almost equal to seawater. For Cs-137, CR were 46-78 with a mean of 56±22. We compared CRs for TFWT of Gadus macrocephalus, Lophius litulon and Oncorhynchus keta with those of Cs-137. Comparing CR-TFWT and CR-Cs-137 for these three species, Spearman-R was <0.4 and p was >0.05, indicating that the dynamics of TFWT and Cs-137 in marine ecology is decoupled.

How to cite: Ohtsuki, S., Shirotani, Y., and Takata, H.: Distributions of tritium in the marine water and biota around Rokkasho Reprocessing Plant, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13486, https://doi.org/10.5194/egusphere-egu23-13486, 2023.

EGU23-15515 | Posters on site | GI2.2

137Cs transport flux to surface water due to shallow groundwater discharge from forest hillslope 

Yuma Niwano, Hiroaki Kato, Satoru Akaiwa, Donovan Anderson, Hikaru Iida, Miyu Nakanishi, Yuichi Onda, Hikaru Sato, and Tadafumi Niizato

Groundwater systems and surface water can interact in a complex manner that influences catchment discharge, which then becomes more complex in forest slopes. A large amount of Radioactive cesium (137Cs) deposited on forests due to the Fukushima Daiichi Nuclear Power Plant accident remains in terrestrial environments and is transported downstream as suspended or dissolved forms by surface water. Generally, the concentration of dissolved 137Cs in surface water increases especially during runoff. While the leaching behavior of 137Cs from contaminated forest materials and soils to surface water has been heavily studied, the influence of 137Cs concentration in shallow groundwater systems in forest slopes have not been investigated. Therefore, detailed hydrological observations of groundwater on a forest hillslope will enable quantitative analysis of the influence of groundwater flow on the formation of dissolved 137Cs concentrations in surface water during base flow and during runoff. Our results showed that the dissolved 137Cs concentration in surface water increases during water discharge. The average concentration of dissolved 137Cs in shallow groundwater was 0.64 Bq/L, which was higher than that in surface water (average 0.10 Bq/L). Furthermore, it was also observed that a part of the shallow groundwater on the slope moves toward the river channel at the time of water runoff. This suggests that shallow groundwater may have flowed into the surface water during the outflow and contributed to the increase of 137Cs in the surface water. In this study, the contribution of groundwater in forest slopes to the dissolved 137Cs concentration in surface water was estimated using the hydrodynamic gradient distribution of groundwater in forest slopes and the measured dissolved 137Cs concentration in groundwater.

How to cite: Niwano, Y., Kato, H., Akaiwa, S., Anderson, D., Iida, H., Nakanishi, M., Onda, Y., Sato, H., and Niizato, T.: 137Cs transport flux to surface water due to shallow groundwater discharge from forest hillslope, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15515, https://doi.org/10.5194/egusphere-egu23-15515, 2023.

ESSI2 – Data, Software and Computing Infrastructures across Earth and space sciences

EGU23-2454 | ECS | Orals | ESSI2.2

The future of NASA Earth Science in the commercial cloud: Challenges and opportunities 

Alexey Shiklomanov, Manil Maskey, Yoseline Angel, Aimee Barciauskas, Philip Brodrick, Brian Freitag, and Jonas Sølvsteen

NASA produces a large volume and variety of data products that are used every day to support research, decision making, and education. The widespread use of NASA’s Earth Science data is enabled by NASA’s Earth Science Data System (ESDS) program, which oversees the archiving and distribution of these data and invests in the development of new data systems and tools. However, NASA’s current approach to Earth Science data distribution — based on distributed institutional archives with individual on-premises high-performance computing capabilities — faces some significant challenges, including massive increases in data volume from upcoming missions, a greater need for transdisciplinary science that synthesizes many different kinds of observations, and a push to make science more open, inclusive, and accessible. To address these challenges, NASA is aggressively migrating its Earth Science data and related tools and services into the commercial cloud. Migration of data into the commercial cloud can significantly improve NASA’s existing data system capabilities by (1) providing more flexible options for storage and compute (including rapid, as-needed access to state-of-the-art capabilities); (2) by centralizing and standardizing data access, which gives all of NASA’s institutional data centers access to all of each other’s datasets; and (3) by facilitating “analysis-in-place”, whereby users can bring their own computational workflows and tools to the data rather than having to maintain their own copies of NASA datasets. However, migration to the commercial cloud also poses some significant challenges, including (1) managing costs under a “pay-as-you-go” model; (2) incompatibility with existing tools and data formats with object-based storage and network access; (3) vendor lock-in; (4) challenges with data access for workflows that mix on-premise and cloud computing; and (5) standardization for highly diverse data as is present in NASA’s data archive. I conclude with two examples of recent NASA activities showcasing capabilities enabled by the commercial cloud: An interactive analysis and development platform for analyzing airborne imaging spectroscopy data, and a new collection of tools and services for data discovery, analysis, publication, and data-driven storytelling (Visualization, Exploration, and Data Analysis, VEDA).

How to cite: Shiklomanov, A., Maskey, M., Angel, Y., Barciauskas, A., Brodrick, P., Freitag, B., and Sølvsteen, J.: The future of NASA Earth Science in the commercial cloud: Challenges and opportunities, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2454, https://doi.org/10.5194/egusphere-egu23-2454, 2023.

EGU23-3657 | Orals | ESSI2.2

CADS 2.0: A FAIRest Data Store infrastructure blooming in a landscape of Data Spaces. 

Angel lopez alos, Baudouin raoult, Edward comyn-platt, and James varndell

First launched as the Climate Data Store (CDS) supporting the Climate Change Service (C3S) and later instantiated as the Atmosphere Data Store (ADS) for the Atmosphere Monitoring Service (CAMS), the shared underlaying Climate & Atmosphere Data Store Infrastructure (CADS) represents the technical backbone for the implementation of Copernicus services entrusted to ECMWF on behalf of the European Commission. CDS in addition also offer access to a selection of datasets from the Emergency Management Service (CEMS).  As the flagship instance of the infrastructure, CDS counts with more than 160k registered users and delivers a daily average over 100 TBs of data from a catalogue of 141 datasets.

CADS Software Infrastructure is designed as a distributed system and open framework that facilitates improved access to a broad spectrum of data and information via a powerful service-oriented architecture offering seamless web-based and API-based search and retrieve capabilities. CADS also provides a generic software toolbox that allow users to make use of available datasets and a series of state-of-the-art data tools that can be combined into more elaborated processes, and present results graphically in the form of interactive web applications.  CADS Infrastructure is hosted in an on-premises Cloud physically located within ECMWF Data Centre and implemented using a collection of virtual machines, networks and large data volumes.  Fully customized instances of CADS, including dedicated Virtual Hardware Infrastructure, Software Application and Catalogued content can be easily deployed thanks to implemented automatization and configuration software tools and a set of configuration files which are managed by a distributed version control system. Tailored scripts and templates allow to easily accommodate different standards and interoperate with external platforms.

ECMWF in partnership with EUMETSAT, ESA and EEA also implement the Data and Information Access Services (DIAS) platform called WEkEO, a distributed cloud-computing infrastructure used to process and make the data generated by Copernicus Services accessible to users together with derived products and all satellite data from the Copernicus Sentinels. Within the partnership ECMWF is responsible for the procurement of the software to implement Data Access Services, Processing and Tools which specifications build on the same fundamentals than CADS.  Adoption of FAIR principles has demonstrated cornerstone to maximize synergies and interactions between CADS, WEkEO and other related platforms.

 

Driven by the increasing demand and the evolving landscape of platforms and services a major project for the modernization of the CADS infrastructure is currently underway. The coming CADS 2.0 aims to capitalize experience, feedbacks, lesson learned, know-how from current CADS, embrace advanced technologies, engage with a broader user community, make the current platform more versatile and cloud oriented, improve workflows and methodologies, ensure compatibility with state-of-the-art solutions such as machine learning, data cubes and interactive notebooks, consolidate the adoption of FAIR principles and strength synergies with related platforms.

 

As complementary Infrastructures, WEkEO will allow users to harness compute resources without the networking and storage costs associated with public Cloud offerings in where CADS Toolbox 2.0  will deploy and run allowing heavy jobs (retrieval and reduction) to be submitted to CADS 2.0 core infrastructure as services.

How to cite: lopez alos, A., raoult, B., comyn-platt, E., and varndell, J.: CADS 2.0: A FAIRest Data Store infrastructure blooming in a landscape of Data Spaces., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-3657, https://doi.org/10.5194/egusphere-egu23-3657, 2023.

EGU23-5038 | Posters on site | ESSI2.2

EO4EU - AI-augmented ecosystem for Earth Observation data accessibility with Extended reality User Interfaces for Service and data exploitation 

Vasileios Baousis, Stathes Hadjiefthymiades, Charalampos Andreou, Kakia Panagidh, and Armagan Karatosun

EO4EU is a European Commission-funded innovation project bringing forward the EO4EU Platform which will access and use of EO data easier for environmental, government, and even business forecasts and operations.

The EO4EU Platform, which will be accessible at www.eo4eu.eu, will link already existing major EO data sources such as GEOSS, INSPIRE, Copernicus, Galileo, DestinE among others and provide a number of tools and services to assist users to find and access the data they are interested in, as well as to analyse and visualise this data. The platform will leverage machine learning to support the handling of the characteristically large volume of EO data as well as a combination of Cloud computing infrastructure and pre-exascale high-performance computing to manage processing workloads.

Specific attention is also given to developing user-friendly interfaces for EO4EU allowing users to intuitively use EO data freely and easily, even with the use of extended reality.

EO4EU objectives are:

  • Holistic DataOps ecosystem to enhance access and usability of EO information.
  • A semantic-enhanced knowledge graph that augments the FAIRness of EO data and supports sophisticated data representation and dynamics.
  • A machine learning pipeline that enables the dynamic annotation of the various EO data sources.
  • Efficient, reliable and interoperable inter- and intra- data layer communications
  • Advance stakeholders’ knowledge capacity through informed decision-making and policy-making support.
  • A full range of use case scenarios addressing current data needs, capitalizing existing digital services and platforms, fostering their usability and practicality, and taking into account ethical aspects aiming at social impact maximization.

Technical and scientific innovation can be summarised as follows:

  • Improve compression rates for image quality and reduce data volumes.
  • Improve the quality of reconstructed compressed images, maintaining the same comparison rates
  • Facilitate the design of custom services with a minimized labelled data requirement
  • Learn robust and transferable representations of EO data
  • Publishing original trained models on EO data with all relevant assisting material to support reusability in a public repository.
  • Data fusion optimized execution in HPC and GPU environment
  • Better accuracy of data representation
  • Customizable visualization tools tailored to the needs of each use case
  • Dedicated graphs for end-users with various granularities, modalities, metrics and statistics to observe the overall trends in time, correlations, and cause-and-effect relationships through a responsive web-interfaced module.

In this presentation, the status of the project, the adopted architecture and the findings from our initial user surveys pertaining to EO data access and discovery will be analysed. Finally, the next steps of the project, the early access to the developed platform and the challenges and opportunities will be discussed.  

How to cite: Baousis, V., Hadjiefthymiades, S., Andreou, C., Panagidh, K., and Karatosun, A.: EO4EU - AI-augmented ecosystem for Earth Observation data accessibility with Extended reality User Interfaces for Service and data exploitation, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5038, https://doi.org/10.5194/egusphere-egu23-5038, 2023.

EGU23-5862 | Orals | ESSI2.2

The ESA Green Transition Information Factories – using Earth Observation and cloud-based analytics to address the Green Transition information needs. 

Patrick Griffiths, Stefanie Lumnitz, Christian Retscher, Frank-Martin Seifert, and Yves-Louis Desnos

In response to the global climate and sustainability crisis, many countries have expressed ambitions goals in terms of carbon neutrality and a green economy. In this context, the European Green Deal comprises several policy elements aimed to achieve carbon neutrality by 2050.

In response to these ambitions, the European Space Agency (ESA) is initiating various efforts to leverage on space technologies and data and support various Green Deal ambitions. The ESA Space for Green Future (S4GF) Accelerator will explore new mechanisms to promote the use of space technologies and advanced modelling approaches for scenario investigations on the Green Transition of economy and society.

A central element of the S4GF accelerator are the Green Transition Information Factories (GTIF). GTIF takes advantage of Earth Observation (EO) capabilities, geospatial and digital platform technologies, as well as cutting edge analytics to generate actionable knowledge and decision support in the context of the Green Transition.

A first national scale GTIF demonstrator has now been developed for Austria.
It addressed the information needs and national priorities for the Green Deal in Austria. This is facilitated through a bottom-up consultation and co-creation process with various national stakeholders and expert entities. These requirements are matched with various EO industry teams that

The current GTIF demonstrator for Austria (GTIF-AT) builds on top of federated European cloud services, providing efficient access to key EO data repositories and rich interdisciplinary datasets. GTIF-AT initially addresses five Green Transition domains: (1) Energy Transition, (2) Mobility Transition, (3) Sustainable Cities, (4) Carbon Accounting and (5) EO Adaptation Services.

For each of these domains, scientific narratives are provided and elaborated using scrollytelling technologies. The GTIF interactive explore tools allow various users to explore the domains and subdomains in more detail to investigate better understand the challenges, complexities, and underlying socio-economic and environmental conflicts. The GTIF interactive explore tools combine domain specific scientific results with intuitive Graphical User Interfaces and modern frontend technologies. In the GTIF Energy Transition domain, users can interactively investigate the suitability of locations at 10m resolution for the expansion of renewable (wind or solar) energy production. The tools also allow investigating the underlying conflicts e.g., with existing land uses or biodiversity constraints. Satellite based altimetry is used to dynamically monitor the water levels in hydro energy reservoirs to infer the related energy storage potentials. In the sustainable cities’ domain, users can investigate the photovoltaic installments on rooftops and assess the suitability in terms of roof geometry and expected energy yields.

GTIF enables various users to inform themselves and interactively investigate the challenges but also opportunities related to the Green Transition ambitions. This enables, for example, citizens to engage in the discussion process for the renewable energy expansion or support energy start-ups to develop new services. The GTIF development follows an open science and open-source approach and several new GTIF instances are planned for the next years, addressing the Green Deal information needs and accelerating the Green Transition. This presentation will showcase some of the GTIF interactive explore tools and provide an outlook on future efforts.

How to cite: Griffiths, P., Lumnitz, S., Retscher, C., Seifert, F.-M., and Desnos, Y.-L.: The ESA Green Transition Information Factories – using Earth Observation and cloud-based analytics to address the Green Transition information needs., EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5862, https://doi.org/10.5194/egusphere-egu23-5862, 2023.

EGU23-5936 | Orals | ESSI2.2

What does the European Spatial Data Infrastructure INSPIRE need in order to become a Green Deal Data Space? 

Joan Masó, Alba Brobia, Ivette Serral, Ingo Simonis, Francesca Noardo, Lucy Bastin, Carlos Cob Parro, Joaquín García, Raul Palma, and Sébastien Ziegler

In May 2007, the INSPIRE directive established the path towards creating the European Spatial Data Infrastructure (ESDI). While the Joint Research Centre (JRC) defined a set of detailed implementation guidelines, the European member states determined the agencies responsible for delivering the different topics specified in the directive’s annexes. INSPIRE’s goal was - and still is - to organize and share Europe’s data supporting environmental policies and actions. However, the way that INSPIRE was defined limited contributions to the public sector, and limited topics to those specifically listed in its annexes. Technical challenges and a lack of appropriate tools have impeded INSPIRE from implementing its own guidelines, and even after 15 years, the dream of a continuous, consistent description of Europe’s environment has still not completely materialized. We should apply the lessons learnt in INSPIRE when we build the Green Deal Data Space (GDDS). To create the GDDS, we should start with ESDI (the European Spatial Data Infrastructure), but also engage and align with the ongoing preparatory actions for data spaces (e.g., for green deal and agriculture) as well as include actors and networks that have emerged or been organized in the recent years. These include: networks of in situ observations (e.g. the  Environmental Research Infrastructures (ENVRI) community); Citizen Science initiatives (such as the biodiversity observations integrated in the Global Biodiversity Information Facility (GBIF), or sensor communities for e.g. air quality); predictive algorithms and machine learning models and simulations based on artificial intelligence (such as the ones deployed in the European Open Science Cloud, International Data Space Association and Gaia-X; services driven both by the scientific community and the private sector); remote sensing derived products developed by the Copernicus Services. Most of these data providers have already embraced the FAIR principles and open data, providing many examples of best practice which can assist newer adopters on the path to open science. In the Horizon Europe project AD4GD (AllData4GreenDeal), we believe that, instead of trying to force data producers to adopt cumbersome new protocols, we should take advantage of the latest developments in geospatial standards and APIs. These allow loosely coupled but well documented and interlinked data sources and models in the GDDS while achieving scientifically robust integration  and easy access to data in the resulting workflows. Another fundamental element will be the adoption of a common and extensible information model enabling the representation and exchange of Green Deal related data in an unambiguous manner, including vocabularies for Essential Variables to organize the observable measurements and increase the level of semantic interoperability. This will allow systems and components from different technology providers to seamless interoperate and exchange data, and to have an integrated view and access to exploit the full value of the available data. The project will validate the approach in three pilot cases: water quality and availability of Berlin lakes, biodiversity corridors in the metropolitan area of Barcelona and low cost air quality sensors in Europe. The AD4GD project is funded by the European Union under the Horizon Europe program.

How to cite: Masó, J., Brobia, A., Serral, I., Simonis, I., Noardo, F., Bastin, L., Cob Parro, C., García, J., Palma, R., and Ziegler, S.: What does the European Spatial Data Infrastructure INSPIRE need in order to become a Green Deal Data Space?, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5936, https://doi.org/10.5194/egusphere-egu23-5936, 2023.

EGU23-7052 | Orals | ESSI2.2

FAIRiCUBE: Enabling Gridded Data Analysis for All 

Katharina Schleidt and Stefan Jetschny

Previously, collecting, storing, owning and, if necessary, digitizing data was vital for any data-driven application. Nowadays, we are swimming in data, whereby one could postulate that we are drowning. However, downloading vast data to local storage and subsequent in-house processing on dedicated hardware is inefficient and not in line with the big data processing philosophy. While the FAIR principles are fulfilled as the data is findable, accessible, and interoperable, the actual reuse of the data to gain new insights depends on the data user’s local capabilities. Scientists aware of the potentially available data and processing capabilities are still not able to easily leverage these resources as required to perform their work; while the analysis gap entailed by the information explosion is being increasingly highlighted, remediation lags.

The core objective of the FAIRiCUBE project is to enable players from beyond classic Earth Observation (EO) domains to provide, access, process, and share gridded data and algorithms in a FAIR and TRUSTable manner. To reach this objective, we are creating the FAIRiCUBE HUB, a crosscutting platform and framework for data ingestion, provision, analysis, processing, and dissemination, to unleash the potential of environmental, biodiversity and climate data through dedicated European data spaces.

In order to gain a better understanding of the various obstacles to leveraging available assets in regard to both data as well as analysis and processing modalities, several use cases have been defined addressing diverse aspects of European Green Deal (EGD) priority actions. Each of the use cases has a defined objective, approach, research question and data requirements.

The use cases selected to guide the creation of the FAIRiCUBE HUB are as follows:

  • Urban adaptation to climate change
  • Biodiversity and agriculture nexus
  • Biodiversity occurrence cubes
  • Drosophila landscape genomics
  • Spatial and temporal assessment of neighborhood building stock

Many of the issues encountered within the FAIRiCUBE project are formally considered solved. Catalogues are available detailing the available datasets, standards define how the datasets are to be structured and annotated with the relevant metainformation. A vast array of processing functionality has emerged that can be applied to such resources. However, while all this is considered state-of-the-art in the EO community, there is a subtle delta blocking access to wider communities that could make good use of the available resources pertaining to their own domains of work. These include, but are not limited to:

  • Identifying available data sources
  • Determining fitness for use
  • Interoperability of data with divergent spatiotemporal basis
  • Understanding access modalities
  • Scoping required resources
  • Providing non-gridded data holdings in a gridded manner

There is great potential in integrating the diverse gridded resources available from EO sources within wider research domains. However, at present, there are subtle barriers blocking this potential. Within FAIRiCUBE, these issues are being collected and evaluated, mitigation measures are being explored together with researchers not from traditional EO domains, with the goal of breaking down these barriers, and enabling powerful research and data analysis potential to a wide range of scientists.

How to cite: Schleidt, K. and Jetschny, S.: FAIRiCUBE: Enabling Gridded Data Analysis for All, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7052, https://doi.org/10.5194/egusphere-egu23-7052, 2023.

EGU23-7074 | ECS | Posters on site | ESSI2.2

An EOSC-enabled Data Space environment for the climate community 

Fabrizio Antonio, Donatello Elia, Guillaume Levavasseur, Atef Ben Nasser, Paola Nassisi, Alessandro D'Anca, Alessandra Nuzzo, Sandro Fiore, Sylvie Joussaume, and Giovanni Aloisio

The exponential increase in data volumes and complexities is causing a radical change in the scientific discovery process in several domains, including climate science. This affects the different stages of the data lifecycle, thus posing significant data management challenges in terms of data archiving, access, analysis, visualization, and sharing. The data space concept can support scientists' workflow and simplify the process towards a more FAIR use of data.

In the context of the European Open Science Cloud (EOSC) initiative launched by the European Commission, the ENES Data Space (EDS) represents a domain-specific implementation of the data space concept. The service, developed in the frame of the EGI-ACE project, aims to provide an open, scalable, cloud-enabled data science environment for climate data analysis on top of the EOSC Compute Platform. It is accessible in the European Open Science Cloud (EOSC) through the EOSC Catalogue and Marketplace (https://marketplace.eosc-portal.eu/services/enes-data-space) and it also provides a web portal (https://enesdataspace.vm.fedcloud.eu) including information, tutorials and training materials on how to get started with its main features. 

The EDS integrates into a single environment ready-to-use climate datasets, compute resources and tools, all made available through the Jupyter interface, with the aim of supporting the overall scientific data processing workflow.  Specifically, the data store linked to the ENES Data Space provides access to a multi-terabyte set of variable-centric collections from large-scale global climate experiments.  The data pool consists of a mirrored subset of CMIP (Coupled Model Intercomparison Project) datasets from the ESGF (Earth System Grid Federation) federated data archive, collected and kept synchronized with the remote copies by using the Synda tool developed within the scope of the IS-ENES3 H2020 project. Community-based, open source frameworks (e.g., Ophidia) and libraries from the Python ecosystem provide the capabilities for data access, analysis and visualisation. Results  and experiment definitions (i.e., Jupyter Notebooks) can be easily shared among users promoting data sharing and application re-use towards a more Open Science approach. 

An overview of the data space capabilities along with the key aspects in terms of data management will be presented in this work.

How to cite: Antonio, F., Elia, D., Levavasseur, G., Ben Nasser, A., Nassisi, P., D'Anca, A., Nuzzo, A., Fiore, S., Joussaume, S., and Aloisio, G.: An EOSC-enabled Data Space environment for the climate community, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7074, https://doi.org/10.5194/egusphere-egu23-7074, 2023.

EGU23-7786 | Posters on site | ESSI2.2

Constructing a Searchable Knowledge Repository for FAIR Climate Data 

Mark Roantree, Branislava Lalić, Stevan Savić, Dragan Milošević, and Michael Scriney

The development of a knowledge repository for climate science data is a multidisciplinary effort between the domain experts (climate scientists), data engineers who's skills include design and building a knowledge repository, and machine learning researchers who provide expertise on data preparation tasks such as gap filling and advise on different machine learning models that can exploit this data.

One of the main goals of the CA20108 cost action is to develop a knowledge portal that is fully compliant with the FAIR principles for scientific data management. In the first year, a bespoke knowledge portal was developed to capture metadata for FAIR datasets. Its purpose was to provide detailed metadata descriptions for shareable micro-meteorological (micromet) data using the WMO standard. While storing Network, Site and Sensor metadata locally, the system passes the actual data to Zenodo, receives back the DOI and thus, creates a permanent link between the Knowledge Portal and the storage platform Zenodo. While the user searches the Knowledge portal (metadata), results provide both detailed descriptions and links to data on the Zenodo platform. Our adherence to FAIR principles are documented below:

  • Findable. Machine-readable metadata is required for automatic discovery of datasets and services. A metadata description is supplied by the data owners for all micro-meteorological data shared on the system which subsequently drives the search engine, using keywords or network, site and sensor search terms.
  • Accessible. When suitable datasets have been identified, access details should be provided. Assuming data is freely accessible, Zenodo DOIs and links are provided for direct data access.
  • Interoperable. Data interoperability means the ability to share and integrate data from different users and sources. This can only happen if a standard (meta)data model is employed to describe data, an important concept which generally requires data engineering skills to deliver. In the knowledge portal presented here, the WMO guide provides the design and structure for metadata.    
  • Reusable. To truly deliver reusability, metadata should be expressed in as detailed a manner as possible. In this way, data can be replicated and integrated according to different scientific requirements. While the Knowledge Portal facilitates very detailed metadata descriptions, not all metadata is compulsory as it was accepted that in some cases, the overhead in providing this information can be very costly. 

Simple analytics are in place to monitor the volume and size of networks in the system. Current metrics include: network count; average size of network (number of sites); dates and size of datasets per network/site; numbers and types of sensors in each site, etc. The current Portal is in Beta version meaning that the system is currently functional but open only to members of the Cost Action who are nominated testers. This status is due to change in Q1/2023 when access will be open to the wider climate science community.  

Current plans include new Tools and Services to assess the quality of data, including the level of gaps and in some cases, machine learning tools will be provided to attempt gap filling for datasets meeting certain requirements.

 

How to cite: Roantree, M., Lalić, B., Savić, S., Milošević, D., and Scriney, M.: Constructing a Searchable Knowledge Repository for FAIR Climate Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7786, https://doi.org/10.5194/egusphere-egu23-7786, 2023.

EGU23-7842 | Orals | ESSI2.2 | Highlight

Destination Earth - Processing Near Data and Massive Data Handling 

Danaele Puechmaille, Jordi Duatis Juarez, Miruna Stoicescu, Michael Schick, and Borys Saulyak

Destination Earth is an operational service under the lead of the European Commission being implemented jointly by ESA, ECMWF and EUMETSAT.

The presentation will provide insights of how Destination Earth provides Near Data Processing and deals with Massive Data.

The objective of the European Commission’s Destination Earth (DestinE) initiative is to deploy several highly accurate digital replicas of the Earth (Digital Twins) in order to monitor and simulate natural as well as human activities and their interactions, to develop and test “what-if” scenarios that would enable more sustainable developments and support European environmental policies. DestinE addresses the challenge to manage and make accessible the sheer amount of data generated by the Digital Twins and observation data located at external sites such as the ones depicted in the figure below. This data will be made available fast enough and in a format ready to support analysis scenarios proposed by the DestinE service users.