ESSI – Earth & Space Science Informatics

EGU22-6482 | Presentations | MAL16 | Ian McHarg Medal Lecture

On Machine Learning from Environmental Data 

Mikhail Kanevski

Geo- and environmental sciences produce a wide variety and numerous data which are extensively used both in fundamental research on Earth processes and in important real-life decision-making. Most natural phenomena are non-linear, multivariate, highly variable and correlated at many spatio-temporal scales. Analysis and treatment of such complex data and their integration/assimilation with science-based models is a difficult problem. Contemporary machine learning (ML) proposes an important set of effective approaches to address this problem at all phases of the study.

Nowadays, Geosciences are one of the major customers of ML ideas and technologies. To a large degree, it is connected to the local and global challenges facing humanity: sustainable development, biodiversity, social and natural hazards and risks, meteo- and climate forecasting, remote sensing Earth observation, etc. Despite being theoretically a universal modelling tool, the success of ML applications significantly depends on the problem formulation, quantity and quality of data and objectives of the study. Therefore, an efficient application of ML demands a good knowledge of the phenomena under study and a profound understanding of learning algorithms which can be achieved in close collaboration between experts in the corresponding domains.

In the current presentation, the study of geo- and environmental data using different machine learning algorithms is reviewed. A problem-oriented approach, which follows a generic data-driven methodology, is applied. The methodology consists of several important steps, in particular, optimization of monitoring and data collection, comprehensive exploratory data analysis and visualization, feature engineering and relevant variables selection, modelling with careful validation and testing, explanation and communication of the results. Advanced experimentation with data by using different supervised and unsupervised ML algorithms helps in better understanding of original data and constructed input feature space, obtaining more reliable and robust results and making intelligent decisions. The presentation is accompanied by simulated and real data case studies from natural hazards (avalanches, forest fires, landslides), environmental risks (pollution) and renewable energy assessment. In conclusion, some general remarks and future perspectives are discussed.


How to cite: Kanevski, M.: On Machine Learning from Environmental Data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6482,, 2022.

EGU22-12895 | Presentations | MAL16 | ESSI Division Outstanding ECS Award Lecture

Artificial Intelligence and Earth System Modeling - revisiting Research of the Past and Future 

Christopher Kadow, David M. Hall, Uwe Ulbrich, Igor Kröner, Sebastian Illing, and Ulrich Cubasch

Today's climate science is being driven by IT more than ever. Earth system models on high-performance computers (HPC) are common tools for researching the past and projecting it into the future. In addition to that, statistical modelling is reborn thanks to modern computer architectures equipped with artificial intelligence (from ensemble to deep learning). Future advances in machine learning will also shape climate research through analysis tools, prediction techniques, signal and event classification, post-processing, Model Output Statistics (MOS), evaluation and verification, etc. This presentation will look at nowadays research about the future (part one) and the past (part two) of our climate system using AI/ML ideas and technologies in combination with numerical climate models - from two publications accordingly. A special focus will be on the importance of climate science, where the needs are, and how to choose the AI/ML hammer wisely:

(1) FUTURE: Derived from machine (ensemble) learning and bagging, a new hybrid climate prediction technique called 'Ensemble Dispersion Filter' is developed. It exploits two important climate prediction paradigms: the ocean's heat capacity and the advantage of the ensemble mean. The Ensemble Dispersion Filter averages the ocean temperatures of the ensemble members every three months, uses this ensemble mean as a restart condition for each member, and further executes the prediction. The evaluation  shows that the Ensemble Dispersion Filter results in a significant improvement in the predictive skill compared to the unfiltered reference system. Even in comparison with prediction systems of a larger ensemble size and higher resolution, the Ensemble Dispersion Filter system performs better. In particular, the prediction of the global average temperature of the forecast years 2 to 5 shows a significant skill improvement.

Kadow, C., Illing, S., Kröner, I., Ulbrich, U., and Cubasch, U. (2017), Decadal climate predictions improved by ocean ensemble dispersion filtering, J. Adv. Model. Earth Syst., 9, 11381149, doi:10.1002/2016MS000787. 

(2) PAST: Nowadays climate change research relies on climate information of the past. Historic climate records of temperature observations form global gridded datasets like HadCRUT4, which is investigated e.g. in the IPCC reports. However, record combining data-sets are sparse in the past. Even today they contain missing values. Here we show that artificial intelligence (AI) technology can be applied to reconstruct these missing climate values. We found that recently successful image inpainting technologies, using partial convolutions in a CUDA accelerated deep neural network, can be trained by 20CR reanalysis and CMIP5 experiments. The derived AI networks are capable to independently reconstruct artificially trimmed versions of 20CR and CMIP5 in grid space for every given month using the HadCRUT4 missing value mask. The evaluation reaches high temporal correlations and low errors for the global mean temperature.

Kadow, C., Hall, D.M. & Ulbrich, U. Artificial intelligence reconstructs missing climate information. Nat. Geosci. 13, 408–413 (2020).

How to cite: Kadow, C., Hall, D. M., Ulbrich, U., Kröner, I., Illing, S., and Cubasch, U.: Artificial Intelligence and Earth System Modeling - revisiting Research of the Past and Future, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12895,, 2022.

ESSI1 – Next Generation Analytics for Scientific Discovery: Data Science, Machine Learning, AI

EGU22-1346 | Presentations | ESSI1.2

Enhance pluvial flood risk assessment using spatio-temporal machine learning models 

Andrea Critto, Marco Zanetti, Elena Allegri, Anna Sperotto, and Silvia Torresan

Extreme weather events (e.g., heavy rainfall) are natural hazards that pose increasing threats to many sectors and across sub-regions worldwide (IPCC, 2014), exposing people and assets to damaging effects. In order to predict pluvial flood risks under different spatio-temporal conditions, three generalized Machine Learning models were developed and applied to the Metropolitan City of Venice: Logistic Regression, Neural Networks and Random Forest. The models considered 60 historical pluvial flood events, occurred in the timeframe 1995-2020. The historical events helped to identify and prioritize sub-areas that are more likely to be affected by pluvial flood risk due to heavy precipitation. In addition, while developing the model, 13 triggering factors have been selected and assessed: aspect, curvature, distance to river, distance to road, distance to sea, elevation, land use, NDVI, permeability, precipitation, slope, soil and texture. A forward features selection method was applied to understand which features better face spatio-temporal overfitting in pluvial flood prediction based on AUC score. Results of the analysis showed that the most accurate models were obtained with the Logistic Regression approach, which was used to provide pluvial flood risk maps for each of the 60 major historical events occurred in the case study area. The model showed high accuracy and most of the occured events in the Metropolitan City of Venice have been properly predicted, demostrating that Machine Learning could substantially improve and speed up disaster risk assessment and mapping helping in overcoming most common bottlenecks of physically-based simulations such as the computational complexity and the need of large datasets of high-resolution information.

How to cite: Critto, A., Zanetti, M., Allegri, E., Sperotto, A., and Torresan, S.: Enhance pluvial flood risk assessment using spatio-temporal machine learning models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1346,, 2022.

EGU22-3131 | Presentations | ESSI1.2

Language model for Earth science for semantic search 

Rahul Ramachandran, Muthukumaran Muthukumaran Ramasubramanian, Prasanna Koirala, Iksha Gurung, and Manil Maskey

Recent advances in technology have transformed the Natural Language Technology (NLT) landscape, specifically, the use of transformers to build language models such as BERT and GPT3. Furthermore, it has been shown that the quality and the domain-specificity of input corpus to language models can improve downstream application results. However, Earth science research has minimal efforts focused on building and using a domain-specific language model. 

We utilize a transfer learning solution that uses an existing language model trained for general science (SciBERT) and fine-tune it using abstracts and full text extracted from various Earth science journals to create BERT-E (BERT for Earth Science). The training process utilized the input of 270k+ Earth science articles with almost 6 million paragraphs. We used Masked Language Modeling (MLM) to train the transformer model. MLM works by masking random words in the paragraph and optimizing the model for predicting the right masked word. BERT-E was evaluated by performing a downstream keyword classification task, and the performance was compared against classification results using the original SciBERT Language Model. The SciBERT-based model attained an accuracy of 89.99, whereas the BERT-E-based model attained an accuracy of 92.18, showing an improvement in overall performance.

We investigate employing language models to provide new semantic search capabilities for unstructured text such as papers. This search capability requires utilizing a knowledge graph generated from Earth science corpora with a language model and convolutions to surface latent and related sentences for a natural language query. The sentences in the papers are modeled in the graph as nodes, and these nodes are connected through entities. The language model is used to give sentences a numeric representation. Graph convolutions are then applied to sentence embeddings to obtain a vector representation of the sentence along with combined representation of the  surrounding graph structure. This approach utilizes both the power of adjacency inherently encoded in graph structures and latent knowledge captured in the language model. Our initial proof of concept prototype used SIMCSE training algorithm (and the tinyBERT architecture) as the embedding model. This framework has demonstrated an improved ability to surface relevant, latent information based on the input query. We plan to show new results using the domain-specific BERT-E model.

How to cite: Ramachandran, R., Muthukumaran Ramasubramanian, M., Koirala, P., Gurung, I., and Maskey, M.: Language model for Earth science for semantic search, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3131,, 2022.

EGU22-3855 | Presentations | ESSI1.2

CGC: an open-source Python module for geospatial data clustering 

Ou Ku, Francesco Nattino, Meiert Grootes, Emma Izquierdo-Verdiguier, Serkan Girgin, and Raul Zurita-Milla

With the growing ubiquity of large multi-dimensional geodata cubes, clustering techniques have become essential to extracting patterns and creating insights from data cubes. Aiming to meet this increasing need, we present Clustering Geodata Cubes (CGC): an open-source Python package designed for partitional clustering of geospatial data. CGC provides efficient clustering methods to identify groups of similar data. In contrast to traditional techniques, which act on a single dimension, CGC is able to perform both co-clustering (clustering across two dimensions e.g., spatial and temporal) and tri-clustering (clustering across three dimensions e.g., spatial, temporal, and thematic), as well as of subsequently refining the identified clusters. CGC also entails scalable approaches that suit both small and big datasets. It can be efficiently deployed on a range of computational infrastructures, from single machines to computing clusters. As a case study, we present an analysis of spring onset indicator datasets at continental scale.

How to cite: Ku, O., Nattino, F., Grootes, M., Izquierdo-Verdiguier, E., Girgin, S., and Zurita-Milla, R.: CGC: an open-source Python module for geospatial data clustering, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3855,, 2022.

EGU22-3940 | Presentations | ESSI1.2

The Analysis of the Aftershock Sequence of the Recent Mainshock in Arkalochori, Crete Island Greece 

Alexandra Moshou, Antonios Konstantaras, and Panagiotis Argyrakis

Forecasting the evolution of natural hazards is a critical problem in natural sciences. Earthquake forecasting is one such example and is a difficult task due to the complexity of the occurrence of earthquakes. Until today, earthquake prediction is based on the time before the occurrence of the main earthquake and is based mainly on empirical methods and specifically on the seismic history of a given area. Τhe analysis and processing of its seismicity play a critical role in modern statistical seismology. In this work, a first attempt is made to study and draw safe conclusions regarding the prediction for the seismic sequence, specifically using appropriate statistical methods like Bayesian predictive, taking into account the uncertainties of the model parameters. The above theory was applied in the recent seismic sequence in the area of ​​Arkalochori in Crete Island, Greece (2021, Mw 6.0). Τhe rich seismic sequence that took place immediately after the main 5.6R earthquake with a total of events for the next three months, approximately 4,000 events of magnitude ML > 1 allowed calculating the probability of having the most significant expected earthquake during a given time as well as calculating the probability that the most significant aftershock is expected to be above a certain magnitude after a major earthquake.


  • Ganas, A., Fassoulas, C., Moshou, A., Bozionelos, G., Papathanassiou, G., Tsimi, C., & Valkaniotis, S. (2017). Geological and seismological evidence for NW-SE crustal extension at the southern margin of Heraklion basin, Crete. Bulletin of the Geological Society of Greece, 51, 52-75. doi:
  • Konstantaras, A.J. (2016). Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters. Earth Science Informatics. 9 (1), 95-100.
  • Konstantaras, A. (2020). Deep learning and parallel processing spatio-temporal clustering unveil new Ionian distinct seismic zone. Informatics. 7 (4), 39.
  • Moshou, A., Papadimitriou, E., Drakatos, G., Evangelidis, C., Karakostas, V., Vallianatos, F., & Makropoulos, K. (2014, May). Focal Mechanisms at the convergent plate boundary in Southern Aegean, Greece. In EGU General Assembly Conference Abstracts (p. 12185)
  • Moshou, A., Argyrakis, P., Konstantaras, A., Daverona, A.C. & Sagias, N.C. (2021). Characteristics of Recent Aftershocks Sequences (2014, 2015, 2018) Derived from New Seismological and Geodetic Data on the Ionian Islands, Greece. 6 (2), 8.
  • C.B., Nolet. G., 1997. P and S velocity structure of the Hellenic area obtained by robust nonlinear inversion of travel times. J. Geophys. Res. 102 (8). 349–367

How to cite: Moshou, A., Konstantaras, A., and Argyrakis, P.: The Analysis of the Aftershock Sequence of the Recent Mainshock in Arkalochori, Crete Island Greece, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3940,, 2022.

EGU22-5487 | Presentations | ESSI1.2

3D Mapping of Active Underground Faults Enabled by Heterogeneous Parallel Processing Spatio-Temporal Proximity and Clustering Algorithms 

Alexandra Moshou, Antonios Konstantaras, Nikitas Menounos, and Panagiotis Argyrakis

Underground faults cast energy storage elements of the accumulated strain energy in border areas of active tectonic plates. Particularly in the southern front of the Hellenic seismic arc, a steady yearly flow in the accumulation of strain energy is being due to the constant rate of motion at which the African plate sub-sinks beneath the Eurasian plate. Partial release of the stored energy from a particular underground fold manifests in the form of an earthquake once reaching the surface of the Earth’s crust. The information obtained for each recorded earthquake includes among others the surface location and the estimated hypocentre depth. Considering that hundreds of thousands earthquakes have been recorded in that particular area, the accumulated hypocentre depths provide a most valuable source of information regarding the in-depth extent of the seismically active parts of the underground faults. This research work applies expert knowledge spatio-temporal clustering in previously reported distinct seismic cluster zones, aiming to associate each individual main earthquake along with its recoded foreshocks and aftershocks to a single underground fault in existing two-dimensional mappings. This process is being enabled by heterogeneous parallel processing algorithms encompassing both proximity and agglomerative density-based clustering algorithms upon main seismic events only to mapped. Once a main earthquake is being associated to a particular known underground fault, then the underground fault’s point with maximum proximity to the earthquake’s hypocentre appends its location parameters, additionally incorporating the dimension of depth to the initial planar dimensions of latitude and longitude. The ranges of depth variations provide a notable indication of the in-depth extent of the seismically active part(s) of underground faults enabling their 3D model mapping.

Indexing terms: spatio-temporal proximity and clustering algorithms, heterogeneous parallel processing, Cuda, 3D underground faults’ mapping


Axaridou A., I. Chrysakis, C. Georgis, M. Theodoridou, M. Doerr, A. Konstantaras, and E. Maravelakis. 3D-SYSTEK: Recording and exploiting the production workflow of 3D-models in cultural heritage. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 51-56, 2014.

Konstantaras A. Deep learning and parallel processing spatio-temporal clustering unveil new Ionian distinct seismic zone. Informatics. 7 (4), 39, 2020.

Konstantaras A.J. Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters. Earth Science Informatics. 9 (1), 95-100, 2016.

Konstantaras A.J., E. Katsifarakis, E. Maravelakis, E. Skounakis, E. Kokkinos and E. Karapidakis. Intelligent spatial-clustering of seismicity in the vicinity of the Hellenic Seismic Arc. Earth Science Research 1 (2), 1-10, 2012.

Konstantaras A., F. Valianatos, M.R. Varley, J.P. Makris. Soft-Computing modelling of seismicity in the southern Hellenic Arc. IEEE Geoscience and Remote Sensing Letters, 5 (3), 323-327, 2008.

Konstantaras A., M.R. Varley, F. Valianatos, G. Collins and P. Holifield. Recognition of electric earthquake precursors using neuro-fuzzy methods: methodology and simulation results. Proc. IASTED Int. Conf. Signal Processing, Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 303-308, 2002.

Maravelakis E., A. Konstantaras, K. Kabassi, I. Chrysakis, C. Georgis and A. Axaridou. 3DSYSTEK web-based point cloud viewer. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 262-266, 2014.

How to cite: Moshou, A., Konstantaras, A., Menounos, N., and Argyrakis, P.: 3D Mapping of Active Underground Faults Enabled by Heterogeneous Parallel Processing Spatio-Temporal Proximity and Clustering Algorithms, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5487,, 2022.

As the interpretability and explainability of artificial intelligence decisions has been gaining attention, novel approaches are needed to develop diagnostic tools that account for the unique challenges of geospatial and environmental data, including spatial dependence and high dimensionality, which are addressed in this contribution. Building upon the geostatistical tradition of distance-based measures, spatial prediction error profiles (SPEPs) and spatial variable importance proles (SVIPs) are introduced as novel model-agnostic assessment and interpretation tools that explore the behavior of models at different prediction horizons. Moreover, to address the challenges of interpreting the joint effects of strongly correlated or high-dimensional features, often found in environmental modeling and remote sensing, a model-agnostic approach is developed that distills aggregated relationships from complex models. The utility of these techniques is demonstrated in two case studies representing a regionalization task in an environmental-science context, and a classification task from multitemporal remote sensing of land use. In these case studies, SPEPs and SVIPs successfully highlight differences and surprising similarities of geostatistical methods, linear models, random forest, and hybrid algorithms. With 64 correlated features in the remote-sensing case study, the transformation-based interpretation approach successfully summarizes high-dimensional relationships in a small number of diagrams.

The novel diagnostic tools enrich the toolkit of geospatial data science, and may improve machine-learning model interpretation, selection, and design.

How to cite: Brenning, A.: Novel approaches to model assessment and interpretation in geospatial machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6955,, 2022.

EGU22-7529 | Presentations | ESSI1.2

Global maps from local data: Towards globally applicable spatial prediction models 

Marvin Ludwig, Álvaro Moreno Martínez, Norbert Hölzel, Edzer Pebesma, and Hanna Meyer

Global-scale maps are an important tool to provide ecologically relevant environmental variables to researchers and decision makers. Usually, these maps are created by training a machine learning algorithm on field-sampled reference data and the application of the resulting model to associated information from satellite imagery or globally available environmental predictors. However, field samples are often sparse and clustered in geographic space, representing only parts of the global environment. Machine learning models are therefore prone to overfit to the specific environments they are trained on - especially when a large set of predictor variables is utilized. Consequently, model validations have to include an analysis of the models transferability to regions where no training samples are available e.g. by computing the Area of Applicability (AOA, Meyer and Pebesma 2021) of the prediction models.

Here we reproduce three recently published global environmental maps (soil nematode abundances, potential tree cover and specific leaf area) and assess their AOA. We then present a workflow to increase the AOA (i.e. transferability) of the machine learning models. The workflow utilizes spatial variable selection in order to train generalized models which include only predictors that are most suitable for predictions in regions without training samples. We compared the results to the three original studies in terms of prediction performance and AOA. Results indicate that reducing predictors to those relevant for spatial prediction, leads to a significant increase of model transferability without significant decrease of the prediction quality in areas with high sampling density.

Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution 2041–210X.13650 (2021) doi:10.1111/2041-210X.13650.

How to cite: Ludwig, M., Moreno Martínez, Á., Hölzel, N., Pebesma, E., and Meyer, H.: Global maps from local data: Towards globally applicable spatial prediction models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7529,, 2022.

EGU22-8323 | Presentations | ESSI1.2

Multi-attribute geolocation inference from tweets 

Umair Qazi, Ferda Ofli, and Muhammad Imran

Geotagged social media messages, especially from Twitter, can have a substantial impact on decision-making processes during natural hazards and disasters. For example, such geolocation information can be used to enhance natural hazard detection systems where real-time geolocated tweets can help identify the critical human-centric hotspots of an emergency where urgent help is required.

Our work can extract geolocation information from tweets by making use of five meta-data attributes provided by Twitter. Three of these are free-form text, namely tweet text, user profile description, and user location. The other two attributes are GPS coordinates and place tags.

Tweet text may or may not have relevant information to extract geolocation. In the cases where location information is available within tweet text, we follow toponym extraction from the text using Named Entity Recognition and Classification (NERC). The extracted toponyms are then used to obtain geolocation information using Nominatim (which is open-source geocoding software that powers OpenStreetMap) at various levels such as country, state, county, city.

Similar process is followed for user profile description where only location toponyms identified by NERC are stored and then geocoded using Nominatim at various levels.

User location field, which is also a free form text, can have mentions of multiple locations such as USA, UK. To extract location from this field a heuristic algorithm is adopted based on a ranking mechanism that allows it to be resolved to a single point of location which can be then mapped at various levels such as country, state, county, city.

GPS coordinates provide the exact longitude and latitude of the device's location. We perform reverse geocoding to obtain additional location details, e.g., street, city, or country the GPS coordinates belong to. For this purpose, we use Nominatim’s reverse API endpoint to extract city, county, state, and country information.

Place tag provides a bounding box or an exact longitude and latitude or name information of location-tagged by the user. The place field data contains several location attributes. We extract location information from different location attributes within the place using different algorithms. Nominatim’s search API endpoint to extract city, county, state, and country names from the Nominatim response if available.

Our geo-inference pipeline is designed to be used as a plug-in component. The system spans an elasticsearch cluster with six nodes for efficient and fast querying and insertion of records. It has already been tested on geolocating more than two billion covid-related tweets. The system is able to handle high insertion and query load. We have implemented smart caching mechanisms to avoid repetitive Nominatim calls since it is an expensive operation. The caches are available both for free-form text (Nominatim’s search API) and exact latitude and longitude (Nominatim’s reverse API). These caches help reduce the load on Nominatim and give quick access to the most commonly queried terms.

With this effort, we hope to provide the necessary means for researchers and practitioners who intend to explore social media data for geo-applications.

How to cite: Qazi, U., Ofli, F., and Imran, M.: Multi-attribute geolocation inference from tweets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8323,, 2022.

EGU22-8648 | Presentations | ESSI1.2

A graph-based fractality index to characterize complexity of urban form using deep graph convolutional neural networks 

Lei Ma, Stefan Seipel, S. Anders Brandt, and Ding Ma

Inspection of the complexity of urban morphology facilitates understanding of human behaviors in urban space, leading to better conditions for the sustainable design of future cities. Fractal indicators, such as fractal dimension, ht-index, and cumulative rate of growth (CRG) index, have been proposed as measures of such complexity. However, these major fractal indicators are statistical rather than spatial, which leads to failure of characterizing the spatial complexity of urban morphology, such as building footprints. To overcome this problem, in this paper a graph-based fractality index (GFI), based on a hybrid of fractal theories and deep learning techniques, is proposed. To quantify the spatial complexity, several fractal variants were synthesized to train a deep graph convolutional neural network. Building footprints of London were used to test the method and the results show that the proposed framework performs better than traditional indices. Moreover, the possibility of bridging fractal theories and deep learning techniques on complexity issues opens up new possibilities of data-driven GIScience.

How to cite: Ma, L., Seipel, S., Brandt, S. A., and Ma, D.: A graph-based fractality index to characterize complexity of urban form using deep graph convolutional neural networks, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8648,, 2022.

EGU22-8891 | Presentations | ESSI1.2

Infilling Spatial Precipitation Recordings with a Memory-Assisted CNN 

Johannes Meuer, Laurens Bouwer, Étienne Plésiat, Roman Lehmann, Markus Hoffmann, Thomas Ludwig, Wolfgang Karl, and Christopher Kadow

Missing climate data is a widespread problem in climate science and leads to uncertainty of prediction models that rely on these data resources. So far, existing approaches for infilling missing precipitation data are mostly numerical or statistical techniques that require considerable computational resources and are not suitable for large regions with missing data. Most recently, there have been several approaches to infill missing climate data with machine learning methods such as convolutional neural networks or generative adversarial networks. They have proven to perform well on infilling missing temperature or satellite data. However, these techniques consider only spatial variability in the data whereas precipitation data is much more variable in both space and time. Rainfall extremes with high amplitudes play an important role. We propose a convolutional inpainting network that additionally considers a memory module. One approach investigates the temporal variability in the missing data regions using a long-short term memory. An attention-based module has also been added to the technology to consider further atmospheric variables provided by reanalysis data. The model was trained and evaluated on the RADOLAN data set  which is based on radar precipitation recordings and weather station measurements. With the method we are able to complete gaps in this high quality, highly resolved spatial precipitation data set over Germany. In conclusion, we compare our approach to statistical techniques for infilling precipitation data as well as other state-of-the-art machine learning techniques. This well-combined technology of computer and atmospheric research components will be presented as a dedicated climate service component and data set.

How to cite: Meuer, J., Bouwer, L., Plésiat, É., Lehmann, R., Hoffmann, M., Ludwig, T., Karl, W., and Kadow, C.: Infilling Spatial Precipitation Recordings with a Memory-Assisted CNN, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8891,, 2022.

The real world does not live on a regular grid. The observations with the best spatiotemporal resolution are generally irregularly distributed over space and time, even though as data they are generally stored in arrays in files. Storing the diverse data types of Earth science, including grid, swath, and point based spatiotemporal distributions, in separate files leads to computer-native array layouts on disk or working memory having little or no connection with the spatiotemporal layout of the observations themselves. For integrative analysis, data must be co-aligned both spatiotemporally and in computer memory, a process called data harmonization. For data harmonization to be scalable in both diversity and volume, data movement must be minimized. The SpatioTemporal Adaptive Resolution Encoding (STARE) is a hierarchical, recursively subdivided indexing scheme for harmonizing diverse data at scale. 

STARE indices are integers embedded with spatiotemporal attributes key to efficient spatiotemporal analysis. As a more computationally efficient alternative to conventional floating-point spatiotemporal references, STARE indices apply uniformly to all spatiotemporal data regardless of their geometric layouts. Through this unified reference, STARE harmonizes diverse data in their native states to enable integrative analysis without requiring homogenization of the data by interpolating them to a common grid first.

The current implementation of STARE supports solid angle indexing, i.e. longitude-latitude, and time. To fully support Earth science applications, STARE must be extended to indexing the radial dimension for a full 4D spatiotemporal indexing. As STARE’s scalability is based on having a universal encoding scheme mapping spatiotemporal volumes to integers, the variety of existing approaches to encoding the radial dimension arising in Earth science raises complex design issues for applying STARE’s principles. For example, the radial dimension can be usefully expressed via length (altitude) or pressure coordinates. Both length and pressure raise the question as to what reference surface should be used. As STARE’s goal is to harmonize different kinds of data, we must determine whether it is better to have separate radial scale encodings for length and pressure, or should we have a single radial encoding, for which we provide tools for translating between various (radial) coordinate systems. The questions become more complex when we consider the wide range of Earth science data and applications, including, for example, model simulation output, lidar point clouds, spacecraft swath data, aircraft in-situ measurements, vertical or oblique parameter retrievals, and earthquake-induced movement detection. 

In this work, we will review STARE’s unifying principle and the unique nature of the radial dimension. We will discuss the challenges of enabling scalable Earth science data harmonization in both diversity and volume, particularly in the context of detection, cataloging, and statistical study of fully 4D hierarchical phenomena events such as extratropical cyclones. With the twin challenges of exascale computing and increasing model simulation resolutions opening new views into physical processes, scalable methods for bringing best-resolution observations and simulations together, like STARE, are becoming increasingly important.

How to cite: Rilee, M. and Kuo, K.-S.: Design Considerations for the 3rd Spatial Dimension of the Spatiotemporal Adaptive Resolution Encoding (STARE), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10799,, 2022.

EGU22-10823 | Presentations | ESSI1.2

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets 

Arjun Nellikkattil, June-Yi Lee, and Axel Timmermann

The study describes a generalized framework to extract and track features from large climate datasets. Unlike other feature extraction algorithms, Scalable Feature Extraction and Tracking (SCAFET) is independent of any physical thresholds making it more suitable for comparing features from different datasets. Features of interest are extracted by segmenting the data on the basis of a scale-independent bounded variable called shape index (Si). Si gives a quantitative measurement of the local shape of the field with respect to its surroundings. To illustrate the capabilities of the method, we have employed it in the extraction of different types of features. Cyclones and atmospheric rivers are extracted from the ERA5 reanalysis dataset to show how the algorithm extracts points as well as surfaces from climate datasets. Extraction of sea surface temperature fronts depicts how SCAFET handles unstructured grids. Lastly, the 3D structures of jetstreams is extracted to demonstrate that the algorithm can extract 3D features too. The detection algorithm is implemented as a jupyter notebook[] accessible to anyone to test out the algorithm.

How to cite: Nellikkattil, A., Lee, J.-Y., and Timmermann, A.: Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10823,, 2022.

With the far-reaching impact of Artificial Intelligence (AI) becoming more acknowledgeable across various dimensions and industries, the Geomatics scientific community has reasonably turned to automated (in some cases, autonomous) solutions while looking to efficiently extract and communicate patterns in high-dimensional geographic data. This, in turn, has led to a range of AI platforms providing grounds for cutting-edge technologies such as data mining, image processing and predictive/prescriptive modelling. Meanwhile, coastal management bodies around the world, are striving to harness the power of AI and Machine Learning (ML) applications to act upon the wealth of coastal information, emanating from disparate data sources (e.g., geodesy, hydrography, bathymetry, mapping, remote sensing, and photogrammetry). The cross-disciplinarity of stakeholder engagement calls for thorough risk assessment and coastal defence strategies (e.g., erosion/flooding control), consistent with the emerging need for participatory and integrated policy analyses. This paper addresses the issue of seeking techno-centric solutions in human-understandable language, for holistic knowledge engineering (from acquisition to dissemination) in a spatiotemporal context; namely, the benefits of setting up a unified Visual Analytics (VA) system, which allows for real-time monitoring and Online Analytical Processing (OLAP) operations on-demand, via role-based access. Working from an all-encompassing data model could form seamlessly collaborative workspaces that support multiple programming languages (packaging ML libraries designed to interoperate) and enable heterogeneous user communities to visualize Big Data at different granularities, as well as perform task-specific queries with little, or no, programming skill. The proposed solution is an integrated coastal management dashboard, built natively for the cloud (aka leveraging batch and stream processing), to dynamically host live Key Performance Indicators (KPIs) whilst ensuring wide adoption and sustainable operation. The results reflect the value of effectively collecting and consolidating coastal (meta-)data into open repositories, to jointly produce actionable insight in an efficient manner.

How to cite: Anthis, Z.: Reading Between the (Shore)Lines: Real-Time Analytical Processing to Monitor Coastal Erosion, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13102,, 2022.

Greku R.Kh., Greku D.R.

Institute of Geological Sciences, Ukraine

SATMAR Laboratory, DDS Capital Investments, Australia


The geoid gravity potential inversion to dense anomalies and their comparison with the seismic tomography models


The results of using the gravitational tomography method is based on the use of algorithms for inverting the values ​​of the gravitational potential (geoid) for calculating the Earth's density anomalies in the entire range of depths up to 5300 km [H. Moritz. The Figure of the Earth's Interior, Wichmann / Karlsruhe, 1990]. The initial data are the anomalies of the geoid heights according to the EGM2008 model in the expansion in spherical functions to harmonics n, m = 2190. The spatial resolution of the data on the surface is 10 km. The depths of the disturbing masses are determined taking into account the harmonic number. The result is maps of density distribution at specified depths, vertical sections and 3D models.

Examples of the distribution of density anomalies for certain regions of Ukraine, Europe and Antarctica are given. Discrepancies with known works on seismotomography are mainly due to different physical properties of the studied medium: density and acoustic properties of rocks.

Density anomaly results are reported as the percent deviation from the Earth's PREM density model for a given location and depth. The entire range of density anomalies in the form of deviations from the PREM model does not exceed 12%. Complete coincidence of the results is observed, for example, at great depths of 2800 km throughout the Earth. The section through the continent of Antarctica with a complex relief and structure to a depth of 400 km also shows similar images from seismic and gravity tomography. The gravitomographic model of the tectonically active region of Vrancea confirms the delamination nature of the formation of the disturbing mass and the occurrence of earthquakes in Europe.

The original call to the present topic of the GD7.5 session (Prof. Saskia Goes) rightly notes the important role of rheological variability in the mantle layers on the deformation of the earth's crust and surface, which can cause catastrophic destruction of large-block structures. In this sense, the intensity of the inner layers according to the data of structural inhomogeneities becomes more and more urgent.

How to cite: Greku, R. and Greku, D.: The geoid gravity potential inversion to dense anomalies and their comparison with the seismic tomography models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3316,, 2022.

Increased observation frequencies are current trends in optical remote sensing. However, there are still challenges at the night side when sunlight is not available. Due to their powerful capabilities in low-light sensing, nightlight satellite sensors have been deployed to capture nightscapes of the Earth from space, observing anthropomorphic and natural activities at night. At present, most nightlight remote sensing applications have mostly focused on artificial lights, particularly within cities or self-luminous entities such as fisheries, oil, shale gas, offshore rigs, and other self-luminous bodies. Little attention has been paid to examining the potential of nightlight remote sensing for mapping land surfaces in low-light suburban areas using satellite remote sensing technology. Observations taken under moonlight are often discarded or corrected to reduce the lunar effects. Some researchers have discussed the possibility of moonlight as a useful illuminating source at night for the detection of nocturnal features on Earth, but no quantitative analysis has been reported so far. This study aims to systematically evaluate the potential of moonlight remote sensing with the whole month of mono-spectral Visible Infrared Imaging Radiometer Suite/Day-Night-Band (VIIRS/DNB) and multi-spectral Unmanned Aerial Vehicle (UAV) nighttime images. The present study aims to:1) to study the potential of moonlight remote sensing for mapping land surface in low-light suburban areas; 2) to investigate the Earth observation capability of moonlight data under different lunar phases;3) to make two daily uniform nightlight datasets(moonlight included and removed) for various night scenes researches, like weather diurnal forecast, circadian rhythms in plants and so on; 4) to discuss the requirements for the next-generation nightlight remote sensing satellite sensors.

How to cite: Liu, D. and Zhang, Q.: The Potential of Moonlight Remote Sensing: A Systematic Assessment with Multi-Source and Multi-Moon phase Nightlight Data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3380,, 2022.

EGU22-4300 | Presentations | ESSI1.4

Synergetic use of Sentinel-1 and Sentinel-2 data for large-scale Land Use/Land Cover Mapping 

Melanie Brandmeier, Maximilian Hell, Eya Cherif, and Andreas Nüchter

One of the largest threats to the vast ecosystem of the Brazilian Amazon Forest is deforestation and forest degradation caused by human activity. The possibility to continuously monitor these degradation events has recently become more feasible through the use of freely available satellite remote sensing data and machine learning algorithms suited for big datasets.

A fundamental challenge of such large-scale monitoring tasks is the automatic generation of reliable and correct land use and land cover (LULC) maps. This is achieved by the development of robust deep learning models that generalize well on new data. However, these approaches require large amounts of labeled training data. We use the latest results of the MapBiomas project as the ‘ground-truth’ for developing new algorithms. In this project, Souza et al. [1] used yearly composites of USGS Landsat imagery to classify the LULC for the whole of Brazil. The latest iteration of their work became available for the years 1985–2020 as Collection 6 ( However, this reference data cannot be considered real ground truth, as it is itself generated from machine learning models and therefore requires novel approaches suited to overcome such problems of weakly supervised learning.

As tropical regions are often covered by clouds, radar data is better suited for continuous mapping than optical imagery, due to its cloud-penetrating capabilities. In a preliminary study, we combined data from ESA’s Sentinel-1 (radar) and Sentinel-2 (multispectral) missions for developing algorithms suited to act on multi-modal and -temporal data to obtain accurate LULC maps. The best performing proposed deep learning network, DeepForestM2, employed a seven-month radar time series combined with a single optical scene. This model configuration reached an overall accuracy of 75.0% on independent test data. A state-of-the-art (SotA) DeepLab model, trained on the very same data, reached an overall accuracy of 69.9%.

Currently, we are further developing this approach of fusing multi-modal data with a temporal aspect to improve on LULC classification. Larger amounts of more recent data, both Sentinel-1 and Sentinel-2 from 2020 are included in training experiments. Additional deep learning networks and approaches to deal with weakly supervised [2] learning are developed and tested on the data. The need for the weakly supervised methods arises from the reference data, which is both inaccurate and inexact, i.e., has a coarser spatial resolution than the training data. We aim to improve the classification results qualitatively, as well as quantitatively compared to SotA methods, especially with respect to generalizing well on new datasets. The resulting deep learning methods, together with the trained weights, will also be made accessible through a geoprocessing tool in Esri’s ArcGIS Pro for users without coding background.

  • Carlos M. Souza et al. “Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine”. en. In: Remote Sensing 17 (Jan. 2020). Number: 17 Publisher: Multidisciplinary Digital Publishing Institute, p. 2735. DOI: 10.3390/ rs12172735.
  • Zhi-Hua Zhou. “A brief introduction to weakly supervised learning”. In: National Science Review 5.1 (Jan. 2018), pp. 44–53. ISSN: 2095-5138. DOI: 10.1093/nsr/nwx106.

How to cite: Brandmeier, M., Hell, M., Cherif, E., and Nüchter, A.: Synergetic use of Sentinel-1 and Sentinel-2 data for large-scale Land Use/Land Cover Mapping, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4300,, 2022.

EGU22-4678 | Presentations | ESSI1.4

Lithology Mapping with Satellite, Fieldwork-based Spectral data, and Machine Learning: the case study of Beiras Group (Central Portugal) 

João Pereira, Alcides J.S.C. Pereira, Artur Gil, and Vasco M. Mantas

The lack of cartography increases the problematic of poor knowledge of geological resources and land management in regions that could benefit greatly from this information. Remote sensing has been an invaluable mean of obtaining data to perform geological mapping objectively and with high scientific accuracy. In Portugal, there is a large gap of cartographic information at 1:50 000 scale throughout the territory, so this work intends to complement this problem through a set of techniques and methodologies applied to a study of a region of Grupo das Beiras.

Spectral databases serve as an initial tool for any methodology involving spectral analysis, namely for the development of cartography methods and quick characterization of rock samples.

To address these issues, a multispectral analysis of january and july 2015th scenes with low cloud cover and atmospheric corrections (level 2) was obtained from Landsat 8 (LS8). Certain statistical tests such as ANOVA and Tukey's were applied to both images to clearly know whether significant differences exist between lithologies.

For the hyperspectral analysis, two sampling campaigns were carried out with the collection of rock samples of metasediments and granites and soil. The analysis was performed in fresh samples, crushed samples (2 mm - 500 μm; 500 μm - 125μm; <125 μm) and soil samples demonstrating a significantly different spectral behavior among various particle sizes in the hyperspectral signatures between fresh and crushed samples. X-ray fluorescence (FRX) was used to obtain geochemical data of major elements to validate the spectral results obtained. As a result, there were identified correspondences between the obtained hyperspectral data and the databases as well in the literature meaning that the spectral signatures of this research are consistent with the studied samples.

The creation of machine learning models is an emerging tool for cartography in which LS8 reflectance data was used for this elaboration. In this work and for this context the models proved to be useful and successful for the image classification from algorithms assigned for this function.

How to cite: Pereira, J., Pereira, A. J. S. C., Gil, A., and Mantas, V. M.: Lithology Mapping with Satellite, Fieldwork-based Spectral data, and Machine Learning: the case study of Beiras Group (Central Portugal), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4678,, 2022.

EGU22-5333 | Presentations | ESSI1.4

Remote sensing – based analysis of the islands dynamics in the Lower Danube River 

Marina Virghileanu and Gabriela Ioana-Toroimac

River islands are important components of the river morpho-dynamics, which can provide essential information on fluvial processes, as well as on sediment and flow regimes. In the same time, river islands play an essential role from the political, environmental and socio-cultural points of view. Thus, understanding the temporal dynamics of the river islands is a required task for channel navigation safety, port functionality, agricultural production and biodiversity. The aim of this study is to analyse the spatial and temporal changes on the river islands during the last 40 years, based on satellite remotely sensed images. The study focuses on the Lower Danube River, downstream the Iron Gates dams altering the flow and sediment load, which also suffers from dredging for navigation. The islands of the Lower Danube River generate major impacts on riparian states relationship, interfere with the ports activity and EU investments (as it is the case of Rast port in Romania), or are the subject of ecological restoration. Multispectral satellite data, including Landsat and Sentinel-2 images, were used for river islands mapping at different temporal moments, with a medium spatial resolution (up to 15 m on Landsat pansharpened data and 10 m on Sentinel-2). Spectral indices, as NDVI and NDWI, allowed the automatic extraction of island boundaries and land cover information. On these, two processes were carried out: 1) the characterization of the river islands morphology, and 2) the quantification of the spatial and temporal changes over time. The resulted data are connected with in-situ measurements on flow regime and sediment supply, as well as with flood events and human activities in order to identify the potential drivers of change. The results demonstrate a strong correlation between river islands dynamics and flood events in the Lower Danube River, as the major flood event from 2006 significantly modified the islands size and shape. This research can allow the identification of the evolutionary model of the Danube River.


This research work was conducted as part of the project PCE 164/2021 “State, Communities and Nature of the Lower Danube Islands: An Environmental History (1830-2020)”, financed by the UEFISCDI.

How to cite: Virghileanu, M. and Ioana-Toroimac, G.: Remote sensing – based analysis of the islands dynamics in the Lower Danube River, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5333,, 2022.

EGU22-7726 | Presentations | ESSI1.4

Investigating the links between primary metabolites of medicinal species with leaf hyperspectral reflectance 

Ayushi Gupta, Prashant K Srivastava, and Karuna Shanker

Recent studies have shown that the turnover in tree species composition across edaphic and elevational gradients is strongly correlated with functional traits. However, our understanding of functional traits has been limited by the lack of detailed studies of foliar chemistry across habitats and the logistical & economic challenges associated with the analysis of plant functional traits at large geographical scales. Advances in remote sensing and spectroscopic approaches that measure spectrally detailed light reflectance and transmittance of plant foliage provides accurate predictions of several functional chemical traits. In this study, Pyracantha crenulata (D. Don) M. Roemer has been used, which is an evergreen thorny shrub species found in open slopes between 1,000 and 2,400 m above mean sea level. P. crenulata is used in the treatment of hepatic, cardiac, stomach, and skin disease. In this study the P. crenulata leaves samples spectra were recorded using an ASD spectroradiometer and following primary metabolites such as chlorophyll, anthocyanin, phenolic, and sterol were analyzed. The spectroradiometer data were preprocessed using filter and then reduced to a few sensitive bands by applying feature selection to the hyperspectral data. The band values were directly correlated with the measured values. The analysis indicates a significant correlation between P. crenulata primary metabolite in the Visible and Infrared region (VISIR). This result suggests that molecules that have important functional attributes could be identified by VISIR spectroscopy, which would save a lot of time and expense as compared to wet laboratory analysis.

How to cite: Gupta, A., Srivastava, P. K., and Shanker, K.: Investigating the links between primary metabolites of medicinal species with leaf hyperspectral reflectance, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7726,, 2022.

EGU22-7859 | Presentations | ESSI1.4

Predictive performance of deep-learning-enhanced remote-sensing data for ecological variables of tidal flats over time 

Logambal Madhuanand, Katja Phillippart, Wiebe Nijland, Jiong Wang, Steven M. De Jong, Allert I. Bijleveld, and Elisabeth A. Addink

Tidal flat systems with a diverse benthic community (e.g., bivalves, polychaetes and crustaceans) is important in the food chain for migratory birds and fish. The geographical distribution of macrozoobenthos depends on physical factors, among which sediment characteristics are key aspects. Although high-resolution and high-frequency mapping of benthic indices (i.e., sediment composition and benthic fauna) of these coastal systems are essential to coastal management plans, it is challenging to gather such information on tidal flats through in-situ measurements. The Synoptic Intertidal Benthic Survey (SIBES) database provides this field information for a 500m grid annual for the Dutch Wadden Sea, but continuous coverage and seasonal dynamics are still lacking. Remote sensing may be the only feasible monitoring method to fill in this gap, but it is hampered by the lack of spectral contrast and variation in this environment. In this study, we used a deep-learning model to enhance the information extraction from remote-sensing images for the prediction of environmental and ecological variables of the tidal flats of the Dutch Wadden Sea. A Variational Auto Encoder (VAE) deep-learning model was trained with Sentinel-2 satellite images with four bands (blue, green, red and near-infrared) over three years (2018, 2019 and 2020) of the tidal flats of the Dutch Wadden Sea. The model was trained to derive important characteristics of the tidal flats as image features by reproducing the input image. These features contain representative information from the four input bands, like spatial texture and band ratios, to complement the low-contrast spectral signatures. The VAE features, the spectral bands and the field-collected samples together were used to train a random forest model to predict the sediment characteristics: median grain size and silt content, and macrozoobenthic biomass and species richness. The prediction was done on the tidal flats of Pinkegat and Zoutkamperlaag of the Dutch Wadden sea. The encoded features consistently increased the accuracy of the predictive model. Compared to a model trained with just the spectral bands, the use of encoded features improved the prediction (coefficient of determination, R2) by 10-15% points for 2018, 2019 and 2020. Our approach improves the available techniques for mapping and monitoring of sediment and macrozoobenthic properties of tidal flat systems and thereby contribute towards their sustainable management.

How to cite: Madhuanand, L., Phillippart, K., Nijland, W., Wang, J., De Jong, S. M., Bijleveld, A. I., and Addink, E. A.: Predictive performance of deep-learning-enhanced remote-sensing data for ecological variables of tidal flats over time, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7859,, 2022.

The definition of urbanized areas, both regionally and globally, is an important basis for urban development monitoring and management, as well as an important condition for studying social policies, economics, culture and the environment.

Thanks to the development of science and technology, urban expansion is developing rapidly. The method of extracting urbanized areas quickly and accurately has become the focus of research.

In the 1970s, with the beginning of the Defense Meteorological Satellite Program (DMSP), the images of night lights that provide a new method for the extraction of urbanized areas were born.

However, due to the limits of spatial resolution and spectral range, it’s true that there are defects in urbanized area extraction based on OMSP-OLS nightlight images.

In recent years, with the development of remote sensing technology, remote sensing data with a higher resolution emerged, providing an effective and applicable data source for urban planning monitoring.

I suppose that the images of night lights with a higher resolution have greater precision than the old ones in the extraction of urbanized areas.

This work has dedicated the images of night lights (NPP-VIIRS and Luojia1-01) and the images of urbanized areas (FROM-GLC 2017) to construct a logistic regression model to evaluate and compare the accuracy of the two images of night lights in the extraction of urbanized areas.

The case study is Barcelona metropolitan area, Spain. (636 km2, 3.3 million inhabitants).

How to cite: Zheng, Q. and Roca, J.: The extraction of urbanized areas based on the high-resolution night lights images: A case study in Barcelona, Spain , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8019,, 2022.

EGU22-9012 | Presentations | ESSI1.4 | Highlight

Mapping the World at 10 m: A Novel Deep-Learning Land Use Land Cover Product and Beyond 

Dawn Wright, Steve Brumby, Sean Breyer, Abigail Fitzgibbon, Dan Pisut, Zoe Statman-Weil, Mark Hannel, Mark Mathis, and Caitlin Kontgis

Land use / land cover (LULC) maps provide critical information to governments, land use planners, and decision-makers about the spatial layout of the environment and how it is changing.  While a variety of LULC products exist, they are often coarse in resolution, not updated regularly, or require manual editing to be useful.  In partnership, Esri, Microsoft Planetary Computer, and Impact Observatory created the world’s first publicly available 10-m LULC map by automating and sharing a deep-learning model that was run on over 450,000 Sentinel-2 scenes.  The resulting map, released freely on Esri’s Living Atlas in June 2021, displays ten classes across the globe: built area, trees, scrub/shrub, cropland, bare ground, flooded vegetation, water, grassland, permanent snow/ice, clouds.  Here, we discuss key findings from the resulting map, including a quantitative analysis of how 10-m resolution allows us to assess small, low density urban areas compared to other LULC products, including the Copernicus CGLS-LC100 100-m resolution global map.  We will also share how we support project-based, on-demand LULC mapping and will present preliminary findings from a new globally consistent 2017-2021 annual LULC dataset across the entire Sentinel-2 archive.

How to cite: Wright, D., Brumby, S., Breyer, S., Fitzgibbon, A., Pisut, D., Statman-Weil, Z., Hannel, M., Mathis, M., and Kontgis, C.: Mapping the World at 10 m: A Novel Deep-Learning Land Use Land Cover Product and Beyond, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9012,, 2022.

EGU22-9265 | Presentations | ESSI1.4

Application of unsupervised machine learning techniques for lithological and soil mapping in Ossa-Morena Zone 

Marcelo Silva, Pedro Nogueira, Renato Henriques, and Mário Gonçalves

Unsupervised methods are a good entry point for satellite image classification, requiring little to no input, and outputting an analysis, in the form of a thematic map, that may act as a guide for more user input intensive methods. For this work, we use K-means methods to classify satellite and drone imagery that cover the Ossa-Morena Zone (OMZ), in Portugal, and assess their capacity for lithological and soil mapping. The drone is equipped with a High Precision NDVI Single Sensor and was flown over the ancient mines of Mociços, Mostardeira and Santa Eulália. The OMZ is a tectonostratigraphic domain shared between Portugal and Spain, divided in Sectors, extraordinarily rich and diverse from a lithological, stratigraphical, and structural point-of-view; for this work, we will focus on the Estremoz-Barrancos sector, comprised of a Neoproterozoic to Devonian metasedimentary succession, with a low-grade metamorphism in greenschist facies, and the Santa Eulália Plutonic Complex (SEPC), an elliptic late-Variscan granitic massif that crosscuts the Alter do Chão-Elvas Sector and the Blastomylonitic belt, constituted by two granitic facies, a few small mafic bodies, and some roof pendants that belong to the Alter do Chão-Elvas Sector.

The imagery used correspond to high-level satellite imagery products gathered between 2004 to 2006 (ASTER) and 2017 to 2021 (Landsat 8 and Sentinel-2), and drone imagery captured on May 6th and August 31st, 2021.

The K-means was applied to a variable number of selected bands, including band ratios, and tested for different number of initial clusters and different distance algorithms (Minimum Distance and Spectral Angle Mapping). Afterwards, it was assessed its ability to outlining and classify different geological structures by comparing the results to the geological map of OMZ.

The obtained thematic maps points towards poorer results when using a larger selection of bands - for instance, ASTER bands 1 to 9 (in which bands 1 to 3N were resampled to 30m) -, due to interspersion of different classes, whereas when using band ratio combinations, such as 4/2 and 6/(5+7) (ASTER), the produced map successfully classifies the major geological features present in the region, with increased sharpness between contacts with a higher number of classes.

Results show that K-means, when used under the correct conditions and parameters, has the potential for lithological and soil mapping through image classification, both for satellite and drone imagery.

Future work will focus on the integration of a pre-processing step for band selection using ML techniques, such as through Principal Component Analysis, Minimum Noise Fraction and Random Forest.

The authors acknowledge the funding provided by FCT through the Institute of Earth Sciences (ICT) with the reference UIDB/GEO/04683/2020.

How to cite: Silva, M., Nogueira, P., Henriques, R., and Gonçalves, M.: Application of unsupervised machine learning techniques for lithological and soil mapping in Ossa-Morena Zone, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9265,, 2022.

EGU22-10163 | Presentations | ESSI1.4

Utilizing hyperspectral imagery for burnt area mapping in a Greek setting 

Christina Lekka, Spyridon E. Detsikas, George P. Petropoulos, Petros Katsafados, Dimitris Triantakonstantis, and Prashant K. Srivastava

Earth observation (EO) - particularly so from hyperspectral imagers - gains increasing interest in wildfire mapping as it offers a prompt with high accuracy and low-cost delineation of a burnt area.  A key hyperspectral orbital sensor with over 20 years of operational life is Compact High-Resolution Imaging Spectrometer (CHRIS), onboard ESA’s PROBA platform. This mission sensor collects spectral data in the VNIR range (400 - 1050 nm) simultaneously at 5 viewing angles and at different spatial resolutions of 17 m and 34 m which contains 19 and 63 spectral bands respectively. The present study focuses on exploring the use of CHRIS PROBA legacy data combined with machine learning (ML) algorithms in obtaining a burnt area cartography. In this context, a further objective of the study has been to examine the contribution of the multi-angle sensor capabilities to enhance the burn scar detection. As a case study was selected a wildfire occurred during the summer of 2007 in the island of Evvoia, in central Greece for which imagery from the CHRIS PROBA archive shortly after the fire outbreak was available. For the accuracy assessment of the derived burnt area estimate the error matrix statistics were calculated in ENVI. Burnt area estimates from were also further validated against the operational product developed in the framework of ESA’s Global Monitoring for Environmental Security/Service Element. This study’s results evidenced the added value of satellite hyperspectral imagery combined with ML classifiers as a cost-effective and robust approach to evaluate a burnt area extent, particularly so of the multi-angle capability in this case. All in all, the study findings can also provide important insights towards the exploitation of hyperspectral imagery acquired from current missions (e.g. HySIS, PRISMA, CHRIS, DESIS) as well as upcoming ones (e.g. EnMAP, Shalom, HySpiri and Chime).

KEYWORDS: CHRIS-PROBA, hyperspectral, machine learning, burnt area mapping

How to cite: Lekka, C., Detsikas, S. E., Petropoulos, G. P., Katsafados, P., Triantakonstantis, D., and Srivastava, P. K.: Utilizing hyperspectral imagery for burnt area mapping in a Greek setting, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10163,, 2022.

EGU22-11946 | Presentations | ESSI1.4

A GEOBIA-based approach for mapping Urban Green Spaces using PlanetScope imagery: the case of Athens 

Evangelos Dosiadis, Dimitris Triantakonstantis, Ana-Maria Popa, Spyridon E. Detsikas, Ionut Sandric, George P. Petropoulos, Diana Onose, and Christos Chalkias

The technological developments in geoinformatics in recent decades have allowed the inclusion of geospatial data and analysis techniques in a wide range of scientific disciplines. One such field is associated with the study of urban green spaces (UGS). Those are defined as open, undeveloped areas that provide residents with recreational space, improving the aesthetic and environmental quality of the neighboring areas. Mapping accurately their spatial extent is absolutely essential requirement in urban planning and their preservation and expansion in Metropolitan areas are of high importance to protect the environment and public health.

The objective of this study is to explore the use of high spatial resolution satellite imagery from PlanetScope combined with the Geographic Object-Based Image Analysis (GEOBIA) classification approach in mapping UGS in Athens, Greece. For the UGS retrieval, an object-based classification (GEOBIA) method was developed utilizing a multispectral PlanetScope imagery acquired in June 2020. Accuracy assessment was performed with a confusion matrix utilizing a set of randomly selected control points within the image selected from field visits and image photo-interpretation. In addition, the obtained UGS were compared versus independent estimates of the Green Urban Areas from the Urban Atlas global operational product. All the geospatial data analysis was conducted in a GIS environment (ArcGIS Pro).

Results demonstrated the usefulness of GEOBIA technique when combined with very high spatial-resolution satellite imagery from PlanetScope in mapping UGS, as was demonstrated by the high accuracy results that were obtained from the statistical comparisons. With the technological evolution in the Earth Observation datasets acquisition and image processing techniques, mapping UGS has been optimized and facilitated and this study contributes in this direction. 

KEYWORDS: Urban Green Spaces, Athens, PlanetScope, Earth Observation, GEOBIA

How to cite: Dosiadis, E., Triantakonstantis, D., Popa, A.-M., Detsikas, S. E., Sandric, I., Petropoulos, G. P., Onose, D., and Chalkias, C.: A GEOBIA-based approach for mapping Urban Green Spaces using PlanetScope imagery: the case of Athens, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11946,, 2022.

EGU22-12092 | Presentations | ESSI1.4

Assessment of 10m Spectral and Broadband Surface Albedo Products from Sentinel-2 and MODIS data 

Jan-Peter Muller, Rui Song, Alistair Francis, Nadine Gobron, Jian Peng, and Nathan Torbick

In Song et al. (2021) [1] a framework for the retrieval of 10 m and 20 m spectral and 20 m broadband surface albedo products was described. This framework consists of four modules: 1) a machine learning based cloud detection method, Spectral ENcoder for SEnsor Independence (SEnSeI) [2]. 2) an advanced atmospheric correction model Sensor Invariant Atmospheric Correction (SIAC) [3]. 3) an endmember-based class extraction method, which enables the retrieval of 10 m/20 m albedos based on a regression between the MODIS Bidirectional Reflectance Distribution Function (BRDF) derived surface albedo and Sentinel-2 surface reflectance resampled to MODIS resolution. 4) a novel method of using the MODIS BRDF prior developed within the QA4ECV programme ( to fill in the gaps in a time series caused by cloud obscuration. We describe how ~1100 scenes were processed over 22 Sentinel-2 tiles at the STFC JASMIN facility. These tiles spanned different 4 month time periods for different users with a maximum of 22 dates per tile. These tiles cover Italy, Germany, South Africa, South Sudan, Ukraine and UK for 6 different users. For the Italian site, a detailed analysis was performed of the impact of this hr-albedo on the fAPAR and LAI derived using TIP [5] whilst a second user employed a method described in [6] to compare MODIS and Sentinel-2 and a third user looked at the impact on agricultural yield forecasting. Lessons learnt from these different applications will be described including both the opportunities and areas where further work is required to improve the data quality.


We thank ESA for their support through ESA-HR-AlbedoMap: Contract CO 4000130413 and the STFC JASMIN facility and in particular Victoria Bennett for their assistance.

[1] Song, R., Muller, J.-P., Francis, A., A Method of Retrieving 10-m Spectral Surface Albedo Products from Sentinel-2 and MODIS data," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 2381-2384, doi: 10.1109/IGARSS47720.2021.9554356

[2] Francis, A., Mrziglod, J., Sidiropoulos, P.  and J.-P. Muller, "SEnSeI: A Deep Learning Module for Creating Sensor Independent Cloud Masks," in IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2021.3128280.

[3] Feng et al. (2019) A Sensor Invariant Atmospheric Correction: Sentinel-2/MSI AND Landsat 8/OLI

[4] Song, R.; Muller, J.-P.; Kharbouche, S.; Yin, F.; Woodgate, W.; Kitchen, M.; Roland, M.; Arriga, N.; Meyer, W.; Koerber, G.; Bonal, D.; Burban, B.; Knohl, A.; Siebicke, L.; Buysse, P.; Loubet, B.; Leonardo, M.; Lerebourg, C.; Gobron, N. Validation of Space-Based Albedo Products from Upscaled Tower-Based Measurements Over Heterogeneous and Homogeneous Landscapes. Remote Sensing 2020, 12, 1–23.doi: 10.3390/rs12050833

[5] Gobron, N.; Marioni, M.; Muller, J.-P.; Song, R.; Francis, A. M.; Feng, Y.; Lewis, P. ESA Sentinel-2 Albedo Case Study: FAPAR and LAI downstream products.; 2021; pp. 1–30. JRC TR (in press)

[6] Peng, J.; Kharbouche, S.; Muller, J.-P.; Danne, O.; Blessing, S.; Giering, R.; Gobron, N.; Ludwig, R.; Mueller, B.; Leng, G.; Lees, T.; Dadson, S. Influences of leaf area index and albedo on estimating energy fluxes with HOLAPS framework. J Hydrol 2020, 580, 124245.

How to cite: Muller, J.-P., Song, R., Francis, A., Gobron, N., Peng, J., and Torbick, N.: Assessment of 10m Spectral and Broadband Surface Albedo Products from Sentinel-2 and MODIS data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12092,, 2022.

EGU22-12524 | Presentations | ESSI1.4

Error-reducing Structure-from-Motion derived Digital Elevation Models in data-scarce environments 

Dirk Bakker, Phuoc Phùng, Marc van den Homberg, Sander Veraverbeke, and Anaïs Couasnon

High-accuracy Digital Elevation Models (DEMs) improve the quality of flood risk assessments and many other environmental applications, yet these products are often unavailable in developing countries due to high survey costs. Structure-from-Motion (SfM) photogrammetry combined with Unmanned Aerial Vehicles (UAVs) has been proven as an effective and low-cost technique that enables a wide audience to construct local-scale DEMs. However, the deviation from strict survey designs and guidelines regarding the number and distribution of Ground Control Points (GCPs) can result in linear and doming errors. Two surveys that suffer from these errors have been supplied for error-reduction, but both areas did not have an available high-accuracy DEM or could afford an additional differential Global Navigation Satellite System (dGNSS) ground survey to extract control points from to use in relative georeferencing approach. Little attention has been given to error-reduction using global open-access elevation data, such as: The TerraSAR-X add-on for Digital Elevation Measurements (TanDEM-X) 90; the Ice, Cloud and land Elevation Satellite-2 (ICESat-2); and Hydroweb.

The aim of this study was to improve and validate the two DEMs using control point extraction from the above data and analyze the validation results to determine the impact on error-reduction using regression analyses between the vertical error and distance from nearest control point. The outcomes shows that the ICESat-2 and Hydroweb can support surveys in absence of dGNSS GCPs with similar impact but cannot replace the necessity of dGNSS measurements in georeferencing and validation. These findings suggests that survey guidelines can be maintained with global open-access elevation data, but the effectiveness depends on both the number, distribution and estimated accuracy. Doming errors can be prevented by correct camera lens calibration, which depends on stable lens conditions or a stratified distribution of high-accuracy reference data. The validation of the SfM DEM in data-scarce areas proves difficult due to the lack of an independent validation dataset, but the Copernicus GLO-30 can give a quantification and show the spatial variability of the error. This study highlights the increasing accuracy of global open-access elevation data and shows that these databases allow the user to easily acquire more and independent data for georeferencing and validation, but the RSME is unable to be accurately reduced to sub-meter.

How to cite: Bakker, D., Phùng, P., van den Homberg, M., Veraverbeke, S., and Couasnon, A.: Error-reducing Structure-from-Motion derived Digital Elevation Models in data-scarce environments, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12524,, 2022.

EGU22-13002 | Presentations | ESSI1.4

ORBiDANSe: Orbital Big Datacube Analytics Service 

Peter Baumann and Dimitar Misev

Datacubes form an accepted cornerstone for analysis (and visualization) ready spatio-temporal data offerings. The increase in user friendliness is achieved by abstracting away from the zillions of files in provider-specific organization. Data¬cube query languages additionally establish actionable datacubes enabling users to ask "any query, any time" with zero coding.

However, typically datacube deployments are aiming at large scale, data center environments accommodating Big Data and massive parallel processing capabilities for achieving decent performance. In this contribution, we conversely report about a downscaling experiment. In the ORBiDANSE project a datacube engine, rasdaman, has been ported to a cubesat, ESA OPS-SAT, and is operational in space. Effectively, the satellite thereby becomes a datacube service offering the standards-based query capabilities of the OGC Web Coverage Processing (WCPS) geo datacube analytics language.
We believe this will pave the way for on-board ad-hoc pro-cessing and filtering on Big EO Data, thereby unleashing them to a larger audience and in substantially shorter time.

In our talk, we report about the concept, technology, and experimental results of ad-hoc on-board datacube query processing.


How to cite: Baumann, P. and Misev, D.: ORBiDANSe: Orbital Big Datacube Analytics Service, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13002,, 2022.

EGU22-592 | Presentations | NH6.1

Producing a High-Resolution Land Cover Map for Southwest Ethiopia Using Sentinel-2 Images and Google Earth Engine 

Farzad Vahidi Mayamey, Navid Ghajarnia, Saeid Aminjafari, Zahra Kalantari, and Kristoffer Hylander

Accurate knowledge of local land cover and land use and their changes is crucial for many different applications such as natural resources management, environmental studies, ecological and biodiversity change evaluations, and food security. Global landcover maps can be useful datasets as a reference source and starting points, however, they usually show areas of geographical disagreements when compared to one another. Moreover, the global land cover products mostly generalize different land cover types which may not fit exactly to the specific needs of different projects and user communities. For instance, different types of forests are mostly considered as one category as they are not easy to be differentiated. In this study, we used high-resolution time-series images of Sentinel-2 to produce a local land cover for southwest Ethiopia with focusing on 8 major land cover classes: Forests, Plantations of exotic trees, Woodlands, Home Gardens, Annual crop fields, Grazing Wetlands, Urban areas, and Open water bodies. We also utilized high-resolution google map satellite imagery and the local expert knowledge on the study area to produce an observational dataset for training and validating steps. Different machine learning algorithms, land cover combinations, and seasonal scenarios were also used to produce the best local land cover map for the study area. For this purpose, a two-step approach was implemented to produce the final high-resolution land cover map. Firstly, we produced the best individual maps for each landcover class based on the highest producer accuracy among different scenarios. Then to produce the final land cover map for all land cover classes, all individual maps were combined by using the consumer accuracy index. For this, we found the most accurate land cover class for each pixel based on the highest consumer accuracy across all individually produced maps in the first step. In the end, we evaluated the results by the validation dataset and using different confusion indices. The final high-resolution land cover map produced in this study showed us the combination of remote sensing and local field-based knowledge in cloud computing platforms like google earth engine (GEE) improves the mapping of different land cover classes across southwest Ethiopia.


Keywords: Land cover map; Sentinel-2; High resolution; Machine Learning; Google Earth Engine; Ethiopia

How to cite: Vahidi Mayamey, F., Ghajarnia, N., Aminjafari, S., Kalantari, Z., and Hylander, K.: Producing a High-Resolution Land Cover Map for Southwest Ethiopia Using Sentinel-2 Images and Google Earth Engine, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-592,, 2022.

EGU22-1004 | Presentations | NH6.1 | Highlight

Remote sensing big data characterization of tectonic and hydrological sources of ground deformation in California 

Xie Hu, Roland Bürgmann, and Xiaohua Xu

Although scientific advances have been achieved in every individual geoscience discipline, enabled by more extensive and accurate observations and more robust models, our knowledge of the Earth’s complexity remains limited. California represents an ideal natural laboratory that hosts active tectonics processes associated with the San Andreas fault system and hydrological processes dominated by the Central Valley, which contribute to dynamic surface deformation across the state. The spatiotemporal characteristics and three-dimensional patterns of the tectonic and hydrological sources of ground motions differ systematically. Spatially, interseismic creep is distributed along several strands of the San Andreas Fault (SAF) system. The elastic deformation off the locked faults usually spreads out over tens of kilometers in a long-wavelength pattern. Hydrologically driven displacements are distinct between water-bearing sedimentary basins and the bounding fault structures. Temporarily, both displacement sources involve long-term trends such as from interseismic creep and prolonged climate change. In addition, episodic signals are due to seismic and aseismic fault slip events, seasonal elastic surface and groundwater loading, and poroelastic groundwater volume strain. The orientation of tectonic strain accumulation in California mainly represents a northwest trending shear zone associated with the right-lateral strike-slip SAF system. Hydrological processes mainly deform the Earth vertically while horizontal motions concentrate along the aquifer margins.

We used the time-series ground displacements during 2015-2019 relying on four ascending tracks and five descending tracks of the ESA’s Sentinel-1 Interferometric Synthetic Aperture Radar (InSAR) observations. We considered the secular horizontal surface velocities and strain rates, constrained from GNSS measurements and tectonic models, as proxies for tectonic processes. InSAR time series and GNSS velocity maps benefit from the Southern California Earthquake Center (SCEC) Community Geodetic Model (CGM) developments. We further extracted the seasonal displacement amplitudes from InSAR-derived time-series displacements as proxies for hydrological processes. We synergized multidisciplinary remote sensing and auxiliary big data including ground deformation, sedimentary basins, precipitation, soil moisture, topography, and hydrocarbon production fields, using an ensemble, random forest machine learning algorithm. We succeeded in predicting 86%-95% of the representative data sets.

Interestingly, high strain rates along the SAF system mainly occur in areas with a low-to-moderate vegetation fraction, suggesting a correlation of rough/high-relief coastal range morphology and topography with the active faulting, seasonal and orographic rainfall, and vegetation growth. Linear discontinuities in the long-term, seasonal amplitude and phase of the surface displacement fields coincide with some fault strands, the boundary zone between the sediment-fill Central Valley and bedrock-dominated Sierra Nevada, and the margins of the inelastically deforming aquifer in the Central Valley, suggesting groundwater flow interruptions, contrasting elastic properties, and heterogeneous hydrological units.

How to cite: Hu, X., Bürgmann, R., and Xu, X.: Remote sensing big data characterization of tectonic and hydrological sources of ground deformation in California, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1004,, 2022.

EGU22-1082 | Presentations | NH6.1 | Highlight

Monitoring of rehabilitation of a raised bog in Ireland using a machine learning model 

Richa Marwaha and Matthew Saunders

Peatlands cover ~3% of the global land area and are under threat from a land-use change such as drainage for peat extraction, and conversion to agriculture and commercial forestry. Historically, peatlands in Ireland have been used for industrial peat extraction and domestic turf cutting. One such example is Cavemount bog, County Offaly, Ireland a former raised bog where peat extraction started in the 1970s and ceased in 2015. After 2015,  a programme of rehabilitation commenced by rewetting the site to raise water levels and to promote the establishment of wetland habitats. Some of the key species associated with the vegetation communities that have been developing across the site include Betula pubescens, Calluna vulgaris, Eriophorum angustifolium, Typha latifolia and Phragmites australis.

To monitor the progress of the colonisation of natural vegetation as part of the rehabilitation plan, reliable habitat maps are required. Google Earth Engine (GEE) is a cloud computing platform where satellite images can be processed to obtain cloud-free composite images. GEE was used to develop an automated approach to map the habitats at Cavemount using multispectral satellite imagery (Sentinel-2) and a machine-learning model i.e. random forest classifier. In this study 9 habitat classes were used which included bare peat, coniferous trees, heather, heather and scrub, open water, pioneer open cutaway habitats, scrub pioneer open cutaway habitats, wetland and mosaic of wetland and scrub. Cloud-free composites for the growing season (May to September) using satellite imagery from 2018-2021 were used to get spectral indices such as NDVI (normalised difference vegetation index), NDWI (normalised difference water index), mNDWI (modified normalised difference water index), red-edge vegetation index, EVI (enhanced vegetation index) and BSI (bare soil index). To extract open water, a seasonal composite of mNDWI was used which could differentiate water from bare peat. The seasonal composite of mNDWI was also used to monitor flooding over winter periods due to increased rainfall and was compared with summer conditions. These indices along with 10 spectral bands (10-20 m resolution) were used as an input to a random forest model, and a yearly habitat map from 2018 to 2021 was developed. The overall accuracy for the testing data from 2018, 2019, 2020 and 2021 was 87.42%, 86.81%, 87.16% and 87.50% and kappa coefficient was 0.81, 0.80, 0.81 and 0.81 respectively. Over time, the former peat extraction area showed a transformation from bare peat to a mosaic of wetland vegetation. This methodology will provide a useful tool for the long-term monitoring of the habitats at this site and to evaluate the effect of rehabilitation on the ecological composition of the site. The final habitat map will also be integrated with the eddy covariance data from the site to provide further insight into the carbon and greenhouse gas dynamics of each habitat in the future.   

How to cite: Marwaha, R. and Saunders, M.: Monitoring of rehabilitation of a raised bog in Ireland using a machine learning model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1082,, 2022.

NOAA reported that the sea level has risen by 203-228 mm since 1880 and the rates accelerated to 3.556 mm/year during 2006-2015. Coastal regions, home to about half of the world’s population (~3 billion), are subject to erosion from wind and waves and subsidence from natural compaction and artificial explication of subsurface resources, and are at high risks of floods from accidental storms and inundations from prolonged sea level rise. The vertical land motion (VLM) directly determines the relative sea level rise. To be specific, locally upward VLM can help alleviate the risks while locally downward VLM may hasten the arrival of inundation. Therefore, monitoring coastal VLM is fundamental in coastal resilience and hazard mitigation. 

One 12-floor building, Champlain Towers South, in the Miami suburb of Surfside collapsed catastrophically and claimed 98 lives on June 24th, 2021. No confident conclusion has been drawn on the cause of the collapse, but it might be related to multiple processes from the ground floor pool deck instability, concrete damage, and land subsidence.

Subsidence has been noted in populous Surfside since 1990s. However, we still lack a detailed mapping of the contemporary coastal subsidence. Here we focus on multi-source Synthetic Aperture Radar (SAR) datasets from C-band Sentinel-1 and X-band TerraSAR-X satellite imagery.

We use the time-series SAR interferometry of ascending Sentinel-1 path 48 to extract the VLM from 2015 to 2021. A comparatively stable GPS station ZMA1 obtained from the Nevada Geodetic Laboratory acts as the reference site to calibrate InSAR results. Long-wavelength atmospheric phase screen and orbit errors are approximated by the low-order polynomial fitting. The average subsidence rates derived from stacking can help reduce the temporarily high-frequency noise. A comparison with the GPS network solution can help verify InSAR measurements. Beyond that, we will also rely on high-resolution X-band TerraSAR-X data (Path 36, strip_014) to elaborate VLM details in the building clusters. Beyond that, NOAA reported that the relative sea level increase in Florida is 2.97 mm/year from 1931 to 2020, i.e., >0.3 m in one century. The 2019 Unified Sea Level Rise Projection in Southeast Florida predicted that the sea level in 2024 will rise by 254 to 432 mm in Florida compared to the level in 2000. We aim to extract the high-accuracy VLM to provide scientific evidence for more safe urban planning and effective adaptation strategies in coastal cities, for an ultimate goal of coastal resilience during global climate change.

How to cite: Yu, X. and Hu, X.: Multi-annual InSAR solution of vertical land motion in 2021 lethal building collapse site in Miami, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2114,, 2022.

EGU22-3291 | Presentations | NH6.1

Land subsidence in Liaohe River Delta, China due to oil and gas withdrawal, measured from multi-geometry InSAR data 

Wei Tang, Zhiqiang Gong, Jinbao Jiang, and Zhicai Li

Liaohe River Delta (LRD) is one of the major centers for hydrocarbon production, agriculture, and fisheries in Northeastern China. Liaohe Oilfield, located in the deltaic region, is China’s third-largest oilfield with an annual production capacity of 10 million tons of crude oil and 800 million m3 of natural gas. Since its operation in 1970, Liaohe Oilfield had produced more than 480 million tons of crude oil and 88 billion m3 of natural gas by the end of 2019.

Pore pressure drawdown due to oil/gas production has resulted in reservoir compaction and surface subsidence above the reservoir. This compaction and subsidence can cause significant damages to production and surface facilities. Main concerns are related to low-lying coastal areas in the context of eustatic sea-level rise (SLR), where land subsidence contributes to relative SLR and exacerbates flooding hazards. In addition, regional and local land subsidence have combined with global SLR to cause wetland loss in the LRD.

Our main aim in this study is to investigate time-dependent land subsidence induced by reservoir depletion in LRD, by analyzing Synthetic Aperture Radar (SAR) images from Sentinel-1 satellite. We retrieved vertical land subsidence and horizontal displacements through processing and merging multi-geometry images from two ascending and two descending tracks covering the area over the 2017 to 2021 time span. We observed significant local subsidence features in several active production oilfields, and the areal extent of subsidence is basically consistent with the spatial extent of production wells. The most prominent subsidence is occurring in the Shuguang oilfield. Due to reservoir depletion, it forms a land subsidence bowl in an elliptical shape with a major axis of ~6.3 km and a minor axis of ~3.2 km, and the maximum subsidence rate is exceeding 230 mm/yr. Because of the large depth D relative to the areal extent L, that is, a relatively small ratio L/D, the displacement field caused by oil production is three-dimensional. An inward, symmetrical, east-west horizontal movement was observed around the subsidence bowl in Shuguang oilfield, with an average eastward movement rate of ~40 mm/yr and an average westward rate of ~30 mm/yr. This three-dimensional deformation is well reproduced by a cylindrical reservoir compaction/subsidence model.

In September 2021, a storm surge accompanied by heavy rainfall caused water levels to rise by 50-130 cm in Liaodong Bay, resulting in extreme flooding in oilfields along the coast. The most severe flooding hazard was occurring in the Shuguang oilfield with the highest land subsidence rate. Our new InSAR-derived surface subsidence associated with the oilfield operations raises the question of the potential impact of land subsidence on the flood severity. This work highlights the importance of incorporating reservoir depletion-induced subsidence into flood management to ensure the security of the oil and gas industry along the coastal regions.

How to cite: Tang, W., Gong, Z., Jiang, J., and Li, Z.: Land subsidence in Liaohe River Delta, China due to oil and gas withdrawal, measured from multi-geometry InSAR data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3291,, 2022.

EGU22-4618 | Presentations | NH6.1

Supervised LSTM Modelling for Classification of Sinkhole-related Anomalous InSAR Deformation Time Series 

Anurag Kulshrestha, Ling Chang, and Alfred Stein

Recently, we have shown that sinkholes can be characterized at an early stage by precursory deformation patterns from InSAR time series [1]. These patterns are often related to sudden changes in deformations or deformation velocities. With such a priori information, accurate deformation modelling and early detection of precursory patterns is feasible. It is still a challenge, however, to scale up methods for classifying larger numbers of sinkholes over large areas that may contain tens of thousands of InSAR observations. To address this, we explore the use of Long Short-Term Memory (LSTM) Networks to classify multi-temporal datasets by learning unique and distinguishable hidden patterns in the deformation time series samples.

We propose to design a two-layered Bi-directional LSTM model and use a supervised classifier to train the model for classifying sinkhole-related anomalous deformation patterns and non-anomalous deformation time series. Samples for linear, Heaviside, and Breakpoint deformation classes are extracted by applying Multiple Hypothesis Testing (MHT) [2] on deformation time series and are used to compile the training dataset. These samples are randomly divided into a training set and a testing set, and associated with a target label using one-hot encoding method. Hyperparameters of the model are tuned over a broad range of commonly used values. Using categorical cross-entropy as the loss function the model is optimized using the Adam optimizer.

We tested our method on an oil extraction field in Wink, Texas, USA, where sinkholes have been continuously evolving since 1980 and a recent sinkhole occurred in mid-2015. We used 52 Sentinel-1 SAR data acquired between 2015 and 2017. The results show that the supervised LSTM model classifies linear deformation samples with an accuracy of ~98%. The accuracy for classifying Heaviside and Breakpoint classes is ~75% at the most. Temporal periodicity was observed in the occurrence of anomalies, which may be related to the frequency of oil extraction and water injection events. Heaviside anomalies were observed to be clustered in space, with a higher density close to the sinkhole location. Breakpoint class anomalies were much more uniformly distributed. Close to the sinkhole spot, we found that two InSAR measurement points were classified into the Breakpoint class, and have considerable changes in deformation velocities (~60o velocity-change angle) shortly before the occurrence of this sinkhole. It is likely associated with the sinkhole-related precursory patterns. Through this study we conclude that our supervised LSTM is an effective classification method to identify anomalies in time. The classification map in terms of InSAR deformation temporal behavior can be used to identify areas which are vulnerable to sinkhole occurrence in the future and require further investigation. In the future, we plan to further develop methods to increase the classification accuracy of anomalous classes.


[1] Anurag Kulshrestha, Ling Chang, and Alfred Stein. Sinkhole Scanner: A New Method to Detect Sinkhole-related Spatio-temporal Patterns in InSAR Deformation Time Series. Remote Sensing, 13(15), 2021.

[2] Ling Chang and Ramon F. Hanssen. A Probabilistic Approach for InSAR Time-Series Postprocessing. IEEE Transactions on Geoscience and Remote Sensing, 54(1):421–430, 2016.

How to cite: Kulshrestha, A., Chang, L., and Stein, A.: Supervised LSTM Modelling for Classification of Sinkhole-related Anomalous InSAR Deformation Time Series, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4618,, 2022.

EGU22-4800 | Presentations | NH6.1

A methodology for the analysis of InSAR Time Series for the detection of ground deformation events 

Laura Pedretti, Massimiliano Bordoni, Valerio Vivaldi, Silvia Figini, Matteo Parnigoni, Alessandra Grossi, Luca Lanteri, Mauro Tararbra, Nicoletta Negro, and Claudia Meisina

The availability of Sentinel-1 dataset with high-temporal resolution of measures (6-12 days) and long time period, can be considered as a “near-real-time monitoring” since it provides a sampling frequency enough to track the evolution of some ground deformations (e.g. landslides, subsidence) if compared to other sensors. However, the analysis and elaborations of such huge dataset, covering large areas, could be tricky and time-consuming without a first exploitation to identify areas of potential interest for significant ground deformations. The A-InSAR Time Series (TS) interpretation is advantageous to understand the relation between ground movement processes and triggering factors (snow, heavy rainfall), both in areas where it is possible to compare A-InSAR TS with in-situ monitoring instruments, and in areas where in situ instruments are scarce or absent. Exploiting the availability of Sentinel-1 data, this work aims to develop a new methodology ("ONtheMOVE" - InterpolatiON of SAR Time series for the dEtection of ground deforMatiOneVEnts) to classify the trend of TS (uncorrelated, linear, non-linear); to identify breaks in non-linear TS; to provide the descriptive parameters (beginning and end of the break, length in days, cumulative displacement, the average rate of displacement) to characterize the magnitude and timing of changes in ground motion. The methodology has been tested on two Sentinel-1 datasets available from 2014 to 2020 in Piemonte region, in northwestern Italy, an area prone to slow-moving slope instabilities. The methodology can be applied to any type of satellite datasets characterized by low or high-temporal resolution of measures, and it can be tested in any areas to identify any ground instability (slow-moving landslides, subsidence) at local or regional scale. The thresholds used for event detection should be calibrated according to geological and geomorphological processes and characteristics of a specific site or regional site. This innovative methodology provides a supporting and integrated tool with conventional methods for planning and management of the area, furnishing a further validation of the real kinematic behaviour of ground movement processes of each test-site and where it is necessary doing further investigation. In addition, elaboration applied to Sentinel-1 data is helpful both for back analysis and for near real-time monitoring of the territory as regards the characterization and mapping of the kinematics of the ground instabilities, the assessment of susceptibility, hazard and risk.

How to cite: Pedretti, L., Bordoni, M., Vivaldi, V., Figini, S., Parnigoni, M., Grossi, A., Lanteri, L., Tararbra, M., Negro, N., and Meisina, C.: A methodology for the analysis of InSAR Time Series for the detection of ground deformation events, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4800,, 2022.

Abstract: Accurate spatial extent changes in urban built-up areas are essential for detecting urbanization, analyzing the drivers of urban development and the impact of urbanization on the environment. In recent years, nighttime light images have been widely used for urban built-up areas extraction, but traditional extraction methods need to be improved in terms of accuracy and automation. In this experiment, a U-Net model was built and trained with the NPP-VIIRS and MOD13A1 data in 2020. We used the optimal tuning model to inverse the spatial extent of built-up areas in China from 2012 to 2021. Through this model, we analyzed the changing trend of built-up areas in China from 2012 to 2021. The results showed that U-Net outperformed random forest (RF) and support vector machine (SVM), with an overall model accuracy (OA) of 0.9969 and mIOU of 0.7342. Built-up areas growth rate is higher in the south and northwest, but the largest growth areas are still concentrated in the east and southeast, which is consistent with China's economic development and urbanization process. This experiment produced a method to extract China's urban built-up areas effectively and rapidly, which provides some reference value for China's urbanization.

How to cite: Bai, M.: Detecting China's urban built-up areas expansion over the last decade based on the deep learning through NPP-VIIRS images, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6822,, 2022.

EGU22-7215 | Presentations | NH6.1 | Highlight

Scalable Change Detection in Large Sentinel-2 data with SEVA 

Mike Sips and Daniel Eggert

We present SEVA, a scalable exploration tool that supports users in detecting land-use changes in large optical remote sensing data. SEVA addresses three current scientific and technological challenges of detecting changes in large data sets: a) the automated extraction of relevant changes from many high-resolution optical satellite observations, b) the exploration of spatial and temporal dynamics of the extracted changes, c) interpretation of the extracted changes. To address these challenges, we developed a distributed change detection pipeline. The change detection pipeline consists of a data browser, extraction, error analysis, and interactive exploration component. The data browser supports users to assess the spatial and temporal distribution of available Sentinel-2 images for a region of interest. The extraction component extracts changes from Sentinel-2 images using the post-classification change detection (PCCD) method. The error assessment component supports users in interpreting the relevance of extracted changes with global and local error metrics. The interactive exploration component supports users in investigating the spatial and temporal dynamics of extracted changes. SEVA supports users through interactive visualization in all components of the change detection pipeline.

How to cite: Sips, M. and Eggert, D.: Scalable Change Detection in Large Sentinel-2 data with SEVA, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7215,, 2022.

EGU22-7236 | Presentations | NH6.1

Application of remote sensing big data in landslide identification 

Yuqi Song and Xie Hu

Landslides are general natural disasters in the world. Knowledge on the landslide distribution is fundamental for landslide monitoring, disaster mitigation and reduction. Traditional in-situ observations (e.g., leveling, GPS, extensometer, inclinometer) usually have high accuracy, but they are expensive and labor intensive and may also involve risks in the field. Alternatively, remote sensing data can capture the regional land surface features and thus are efficient in landslide mapping. Recent studies on landslide identification mainly rely on the pixel-based or object-oriented classification using optical images. Nonetheless, landslide activities are governed by multiple processes including the topography, geology, land cover, catchment, precipitation, and tectonics (e.g., dynamic shaking or aseismic creeping). Remote sensing data and products are beneficial to extract some of these critical parameters on a regional scale. Rapid development of machine learning algorithms makes it possible to systematically construct landslide inventory by interpreting multi-source remote sensing big data. The populous California suffers from high risks of landsliding. The United States Geological Survey (USGS) compiles the landslide inventory in the State and reports that California has about 86k landslides. Steep slope in the costal ranges, wet climate in the northern California, youthful materials at the surface from active tectonics of the San Andreas Fault and secondary fault systems, dynamic and aseismic movements instigated from the faults all contribute to high landslide susceptibility in California. In May 2017, the steep slopes at Mud Creek on California’s Big Sur coast collapsed catastrophically. During January and February in 2019, several landslides occurred on the southern part of Santa Monica Mountains. In January 2021, a large debris flow hit the Rat Creek in Big Sur due to extreme precipitation. In addition, a fairly complete collection of remote sensing data and products are available in California. Here we use machine learning methods to refine landslides in California using remote sensing big data, including elevation, slope, and aspect derived from SRTM digital elevation models (DEM), the normalized differential vegetation index (NDVI) derived from Landsat 8 OLI images, the hydrometeorological observations, the nearest distance to rivers and faults, the geological and land cover maps, as well as Synthetic Aperture Radar (SAR) images. We will use the archived landslide inventory for model training and testing. We plan to further explore the critical variables in determining landslide occurrences and the inferred triggering mechanisms.

How to cite: Song, Y. and Hu, X.: Application of remote sensing big data in landslide identification, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7236,, 2022.

EGU22-7803 | Presentations | NH6.1

Detection of Volcanic Deformations in InSAR Velocity Maps - a contribution to TecVolSA project 

Teo Beker, Homa Ansari, Sina Montazeri, and Qian Song

TecVolSA (Tectonics and Volcanoes in South America) is a project with a goal of developing intelligent Earth Observation (EO) data processing and exploitation for monitoring various geophysical processes in central south American Andes. Large amount of Sentinel-1 data over the period of about 5 years has been processed using mixed Permanent Scatterer and Distributed Scatterer (PS/DS) approaches. The received products are velocity maps with InSAR relative error in the order of 1 mm/yr on a large scale (>100km). The second milestone of the project was automatic extraction of information from the data. In this work, the focus is on detecting volcanic deformations. Since the real data prepared in such manner is limited, to train a deep learning model for detection of volcanic deformations, a synthetic training set is used. Models are trained from scratch and InceptionResNet v2 was selected for further experiments as it was found to give best performance among the tested models. The explainable AI (XAI) techniques were used to understand and analyze the confidence of the model and to understand how to improve it. The models trained on synthetic training set underperformed on real test set. Using GradCAM technique, it was identified that slope induced signal and salt lake deformations were mistakenly identified as volcanic deformations. These patterns are difficult to simulate and were not contained in synthetic training set. Bridging this distribution gap was performed using hybrid synthetic-real fine-tuning set, consisting of the real slope induced signal data and synthetic volcanic data. Additionally, false positive rate of the model is reduced using low-pass spatial filtering of the real test set, and finally by adjustments of the temporal baseline received from a sensitivity analysis. The model successfully detected all 10 deforming volcanoes in the region, ranging from 0.4 - 1.8 cm/yr in deformation.

How to cite: Beker, T., Ansari, H., Montazeri, S., and Song, Q.: Detection of Volcanic Deformations in InSAR Velocity Maps - a contribution to TecVolSA project, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7803,, 2022.

EGU22-8948 | Presentations | NH6.1 | Highlight

Decrease of anthropogenic emission from aviation and detection of natural hazards with potential application in geosciences using satellite sensors, ground-based networks and model forecasts in the context of the SACS/ALARM early warning system 

Hugues Brenot, Nicolas Theys, Erwin de Donder, Lieven Clarisse, Pierre de Buyl, Nicolas Clerbaux, Simone Dietmüller, Sigrun Matthes, Volker Grewe, Sandy Chkeir, Alessandra Mascitell, Aikaterini Anesiadou, Riccardo Biondi, Igor Mahorčič, Tatjana Bolić, Ritthik Bhattacharya, Tim Winter, Adam Durant, Michel Van Roozendael, and Manuel Soler

Aviation safety can be jeopardised by multiple hazards arising from natural phenomena, e.g., severe weather, aerosols/gases from natural hazard, space weather. Furthermore, there is the anthropogenic emissions and climate impact of aviation, that could be reduced. The use of satellite sensors, ground-based networks, and model forecasts is essential to detect and mitigate the risk of airborne hazards for aviation, as flying through them can have a strong impact on engines (abrasion and damages caused by aerosols) and on the health of passengers (e.g. due to associated hazardous trace gases).

The goal of this work is to give an overview of the alert data products in development in the ALARM SESAR H2020 Exploratory Research project. The overall objective of ALARM (multi-hAzard monitoring and earLy wARning system; is to develop a prototype global multi-hazard monitoring and Early Warning System (EWS), building upon SACS (Support to Aviation Control Service; This work presents the creation of alert data products, which have a potential use in geosciences (e.g. meteorology, climatology, volcanology). These products include observational data, alert flagging and tailored information (e.g., height of hazard and contamination of flight level – FL). We provide information about the threat to aviation, but also notifications for geoscience applications. Three different manners are produced, i.e., early warning (with geolocation, level of severity, quantification, …), nowcasting (up to 2 hours), and forecasting (from 2 to 48 hours) of hazard evolution at different FLs. Note that nowcasting and forecasting concerns SO2 contamination at FL around selected airports and the risk of environmental hotspots. This study shows the detection of 4 types of risks and weather-related phenomena, for which our EWS generates homogenised NetCDF Alert Products (NCAP) data. The first type is the near real-time detection of recent volcanic plumes, smoke from wildfires, and desert dust clouds, and the interest of combining geostationary and polar orbiting satellite observations. For the second type, ALARM EWS uses satellite and ground-based (GB) observations, and model forecasts to create NCAP related to real-time space weather activity. Exploratory research is developed by ALARM partners to improve detection of a third type of risk, i.e., the initiation of small-scale deep convection (under 2 km) around airports. GNSS data (ground-based networks and radio-occultations), lightning and radar data, are used to implement NCAP data (designed with the objective of bringing relevant information for improving nowcasts around airports). The fourth type is related to the detection of environmental hotspots, which describe regions that are strongly sensitive to aviation emissions. ALARM partners investigate the climate impact of aviation emissions with respect to the actual atmospheric synoptical condition, by relying on algorithmic Climate Change Functions (a-CCFs). These a-CCFs describe the climate impact of individual non-CO2 forcing compounds (contrails, nitrogen oxide and water vapour) as function of time, geographical location and cruise altitude.


ALARM has received funding from the SESAR Joint Undertaking (JU) under grant agreement No 891467. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the SESAR JU members other than the Union.

How to cite: Brenot, H., Theys, N., de Donder, E., Clarisse, L., de Buyl, P., Clerbaux, N., Dietmüller, S., Matthes, S., Grewe, V., Chkeir, S., Mascitell, A., Anesiadou, A., Biondi, R., Mahorčič, I., Bolić, T., Bhattacharya, R., Winter, T., Durant, A., Van Roozendael, M., and Soler, M.: Decrease of anthropogenic emission from aviation and detection of natural hazards with potential application in geosciences using satellite sensors, ground-based networks and model forecasts in the context of the SACS/ALARM early warning system, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8948,, 2022.

The Arctic region is a very remote and vulnerable ecosystem but also rich in natural resources, which have been exploited for many decades.  These ecosystems are particularly vulnerable to any industrial accident.  The Arctic has short summers, low temperatures, and limited sunlight, so it can take decades for Arctic ecosystems to recover from anthropogenic pollution.  Examples of the potential hazards when exploiting natural resources in such fragile environments and the detrimental impact on the polar ecosystem and communities are all too frequent.  In the case of the oil and gas industry, spills caused by the failure of old pipelines are a very regular occurrence.  Given the geographical isolation of these activities, remote sensing is an obvious technology to underpin any effective monitoring solution.  Increasing availability in the public domain, together with recent advances in resolution, suggest satellite imagery can play a key role in effectively monitoring oil spills and is the focus for this study.

The remote sensing of polar regions and the detection of terrestrial oil spills have both been studied previously, however, there has been little work to investigate the two in combination. The challenge is how to detect an oil spill if it is from an unknown incident or illegal activity such as discharge.  Oil spill detection by applying image processing techniques to Earth Observation (EO) data has historically focused on marine pollution.  Satellite-based Synthetic Aperture Radar (SAR), with its day/night and all-weather capability and wide coverage, has proven to be effective.  Oil spill detection with remote sensing in terrestrial environments has received less attention due to the typically smaller regional scale of terrestrial oil spill contamination together with the overlapping spectral signatures of the impacted vegetation and soils.  SAR has not proven to be very effective onshore because of the false positives and consequent ambiguities associated with interpretation, reflecting the complexity of land cover.

A number of studies have highlighted the potential of airborne hyperspectral sensors for oil spill detection either through the identification of vegetation stress or directly on bare sites, with absorption bands identified in the short-wave infrared (SWIR) range at 1730 and 2300nm.  However, unlike spaceborne sensors, these devices do not provide regular coverage over broad areas.  Several hyperspectral satellites have been launched to date but have technical constraints.  The medium spatial resolution and long revisit times of most current hyperspectral instruments limit their use for identifying smaller incidents that often occur with high unpredictability.

No single sensor currently has all the characteristics required to detect the extent, impact and recovery from onshore oil spills.  This study will look at the potential of combining medium spatial resolution imagery (Sentinel-2) for initial screening, with high spatial/temporal (WorldView-3) and high spectral (PRISMA) resolution data, both covering the key SWIR bands, for site specific analysis.

How to cite: Sadler, G. and Rees, G.: Monitoring anthropogenic pollution in the Russian sub-Arctic with high resolution satellite imagery: An oil spill case study, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10041,, 2022.

EGU22-10256 | Presentations | NH6.1

Automatic Interferogram Selection for SBAS-InSAR Based on Deep Convolutional Neural Networks 

Yufang He, Guangzong Zhang, Hermann Kaufmann, and Guochang Xu

The small baseline subset of spaceborne interferometric synthetic aperture radar (SBAS-InSAR) technology has become a classical method for monitoring slow deformations through time series analysis with an accuracy in the centimeter or even millimeter range. Additionally, the process of calculating interferograms itself directly affects the accuracy of the SBAS-InSAR measurements, whereby the selection of high-quality interferogram pairs is crucial for SBAS data processing. Especially in the era of big data, the demand for an automatic and effective selection method of high-quality interferograms in SBAS-InSAR technology is growing. However, there are some methods including simulated annealing (SA) searching strategy, the graph theory (GT) and others. Until now, the most effective approach of high-quality interferogram selection still relies on the traditional manual method. Due to the high degree of human interaction and a large risk of repetitive work, this traditional manual method increases the instability and inconsistency of the deformation calculation.
Considering that the different qualities of interference pairs show different color characteristics, the DCNN method is adopted in this study. The ResNet50 model (one of DCNN models) has the advantages of representing a standard network structure and easy programming. The idea is based on the fact that interferograms less contaminated by different noise sources display smaller color phase changes within a certain phase range. Hence, training sets containing almost 3000 interferograms obtained from land subsidences in several subregions of Shenzhen in China with varying contaminations of noise were established. Up next, the ResNet50–DCNN model was set up, the respective parameters were determined through analysis of the data sets trained, and traditional interferogram selection methods were used to evaluate the performance. For simulation experiments and the evaluation and validation of real data, phase unwrapping interferograms obtained by the time-spatial baseline threshold method are used to classify high and low quality interferograms based on the ResNet50 model. The quantity of high quality interferograms extracted by the ResNet50–DCNN method is above 90% for the simulation experiment and above 87% concerning the real data experiment, which reflects the accuracy and reliability of the proposed method. A comparison of the overall surface subsidence rates and the deformation information of local PS points reveals little difference between the land subsidence rates obtained by the ResNet50–DCNN method and the actual simulations or the manual method. 
The proposed advanced method provides an automatized and fast interferogram selection process for high quality data, which contributes significantly to the application of SBAS-InSAR engineering. For future research, we will expand the training samples and study DCNN models to further improve the general accuracy for a wider applicability of this method.

How to cite: He, Y., Zhang, G., Kaufmann, H., and Xu, G.: Automatic Interferogram Selection for SBAS-InSAR Based on Deep Convolutional Neural Networks, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10256,, 2022.

The European spruce bark beetle (Ips typographus) is one of the most detrimental insects of the European spruce forests. An effective mitigation measure consists in the removal of infected trees before the beetles leave the bark, which generally happens before the end of June. To minimize economic loss and prevent tree destruction, fast and early detection of European spruce bark beetle is therefore crucial for the future of spruce forests.

In order to detect the forest stressed regions, possibly associated to the beetle infestation, we investigated the forest vigour changes in time. One of the most damaged regions is Northern Italy in which the beetle diffusion has highly increased after the Storm Adrian of late 2018.

In this work we used Sentinel-2 images of a study area in the mountain territory of Val di Fiemme (Trento, Italy) from early 2017 to late 2021. A preliminary field investigation was necessary to localize healthy (green) and stressed (red) trees. NDVI index trends from Sentinel-2 showed an evident vigour discrepancy from green and red regions.

We therefore conceive a classification algorithm based on the slope of fitting lines of NDVI over time. Model accuracy is around 86%. The result is a classified map useful to distinguish stressed and healthy forest areas.

By using the proposed method and Google Earth Engine computational capabilities, we highlight the potential of a simple and effective model to predict and detect forest stressed areas, potentially associated with the diffusion of the European spruce bark beetle.

How to cite: Giomo, M., Moretto, J., and Fantinato, L.: Detection of forest stress from European spruce bark beetle attack in Northern Italy through a stress classification algorithm based on NDVI temporal changes, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10630,, 2022.

EGU22-10780 | Presentations | NH6.1

Morphometric analysis of volcanic structures using digital elevation models and models developed from radar images in the Apan volcanic field, México. 

Jesús Octavio Ruiz Sánchez, Jesús Eduardo Méndez Serrano, Mariana Patricia Jácome Páz, Nelly Ramírez Serrato, and Nestor López Váldes

The present project aims to make a preliminary assessment of the volcanic risk represented by the Apan Volcanic Field (CVA). The methodology was divided into two parts. In the first, Digital Elevation Models (DEM) published by official sources were used to identify unreported structures and perform morphometric analysis of previously dated structures. In the second stage, a new DEM was developed from interferometric methodologies to compare the results with those obtained from official sources. Two SAR satellite images from the SENTINEL-1 satellite of ESA's Copernicus program were used. Being the first of October 14, 2021, leader image, and the second of October 26, 2021, slave image. These images were processed in ESA's SNAP software. For the morphometric analysis, volcanic structures have been classified into three major categories: Young cones (0.18 Ma - 0.5 Ma), Intermediate cones (0.5 Ma-1 Ma), and Old cones (1 Ma-3 Ma). From the official DEM analysis, 243 volcanic structures were reported within the study area with a preliminary predominance of structures that fall in the range of old cones, 4 areas with a higher concentration of volcanic structures were detected in which some highly populated localities are found. In addition, demographic parameters were used for a better preliminary risk assessment in the study area. Official and Radar images DEMs were used for the morphometric analysis and the results were compared with the previously published models. Finally, it was concluded the importance of the CVA by comparison with other two Mexican volcanic fields CVA represents a moderate volcanic risk, for which a greater number of studies and monitoring in the area is recommended.  This project provides a new understanding of the volcanic hazard and risk associated with the CVA and the development of the surrounding social environment.

How to cite: Ruiz Sánchez, J. O., Méndez Serrano, J. E., Jácome Páz, M. P., Ramírez Serrato, N., and López Váldes, N.: Morphometric analysis of volcanic structures using digital elevation models and models developed from radar images in the Apan volcanic field, México., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10780,, 2022.

Traditional fertilization techniques in crop production consist in a homogeneous distribution of inputs all over the cultivated field. Alternatively variable fertilization methods could minimize the environmental impact and increase economic benefits.

The objective of this study is to evaluate the capabilities of a Google Earth Engine code conceived to rapidly study the variability of cultivated fields, for a possible variable fertilization. The tool is semi-automatic as it requires just the field boundary and it gives few outputs ready to be inspected by the user. This work presents an application of this model in a corn field in Northern Italy (province of Venice).

Field variability is evaluated through NDVI index extracted from Sentinel-2 images from 2017 to 2021. For the purpose, the tool provides NDVI statistics, classified maps, classified area percentages, and punctual NDVI trends.

Results show that boundary regions of the field are systematically less vigour than other parts, thus crop production is not efficient. Otherwise, fertilization should be enhanced in internal parts, as they are steadily healthier.

The proposed model is a fast way to analyse field vigour status and Google Earth Engine capabilities permit to apply it nearly all over the world. Field variability and linked variable fertilization are crucial to reduce environmental and increase economic benefits, especially in extensive farming.

How to cite: Moretto, J., Giomo, M., Fantinato, L., and Rasera, R.: Application of a semi-automatic tool for field variability assessment on a cultivated field in Northern Italy to evaluate variable fertilization benefits, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10962,, 2022.

EGU22-11589 | Presentations | NH6.1

New advances of the P-SBAS based automatic and unsupervised tool for the co-seismic Sentinel-1 DInSAR products generation 

Fernando Monterroso, Andrea Antonioli, Simone Atzori, Claudio De Luca, Riccardo Lanari, Michele Manunta, Emanuela Valerio, and Francesco Casu

Differential Synthetic Aperture Radar Interferometry (DInSAR) is a key method to estimate, with centimeter accuracy, the earth surface displacements caused by natural events or anthropogenic activities. Furthermore, since 2014 the scientific community can benefit from the huge spaceborne SAR data archives acquired by the Copernicus Sentinel-1 (S1) satellite constellation, which operationally provides SAR data with a free and open data access policy at nearly global scale. By using the S1 acquisitions, an automatic and unsupervised processing tool that generates co-seismic interferograms and LOS displacement maps has been developed. This tool routinely queries two different earthquake catalogs (USGS and INGV) to trigger, in automatic way, the S1 data download and the DInSAR processing through the Parallel Small BAseline Subsets (P-SBAS) algorithm. In particular, in order to guide the algorithm to only intercept the earthquakes which may produce ground displacements detectable through the DInSAR technology, the tool starts the SAR data processing for those events with a magnitude greater than 4.0 in Europe, and greater than 5.5 at a global scale.

We first remark that, in order to optimize the extension of the investigated area, thus reducing the processing time and effectively exploiting the available computing resources, an algorithm for the estimation of the co-seismically affected area has been integrated as first step of the workflow. More specifically, by considering the moment tensors provided by public catalogs (USGS, INGV, Global CMT project), a forward modelling procedure generates the predicted co-seismic displacement field, used by the P-SBAS algorithm to optimize some of the DInSAR processing steps. In particular, the phase unwrapping (PhU) algorithm is applied only to the part of the DInSAR interferograms delimited by the area identified through the predicted scenario and not to the whole S1 scene. In addition, the presented automatic and unsupervised tool has been migrated within a Cloud Computing (CC) environment, specifically the Amazon Web Service (AWS). This strategy allows us a more efficient management of the needed computing resources also in emergency scenario.

The adopted solutions allowed the creation of a worldwide co-seismic maps database. Indeed, by benefiting of the last seven years of Sentinel-1 operation, the tool has generated approximately 6500 interferograms and LOS displacement maps, corresponding to a total of 383 investigated earthquakes.

Note also that the generated interferograms and displacement maps have been made available for the scientific community through the EPOS infrastructure and the Geohazards Exploitation Platform, thus helping scientists and researchers to investigate the dynamics of surface deformation in the seismic zones around the Earth also in the case they have not available specific DInSAR processing capabilities and/or skills.

How to cite: Monterroso, F., Antonioli, A., Atzori, S., De Luca, C., Lanari, R., Manunta, M., Valerio, E., and Casu, F.: New advances of the P-SBAS based automatic and unsupervised tool for the co-seismic Sentinel-1 DInSAR products generation, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11589,, 2022.

EGU22-11701 | Presentations | NH6.1

Comparative analysis of  the role of labelled benchmark datasets for automatic flood mapping using SAR data 

Dibakar Kamalini Ritushree, Mahdi Motagh, Shagun Garg, and Binayak Ghosh

 The current scenario of the world has witnessed extreme events of floods irrespective of the heterogeneity in the geographical context. The necessity for accurately mapping such events is more of the essence for disaster relief and recovery efforts. The role of satellite imageries from both optical and radar sensors could have immensely benefited the process due to its easier interpretability and high resolution. However, the use of optical sensors for flood extent extraction is limited by weather conditions and the presence of clouds.   In contrast,   SAR sensors have proved to be one of the most powerful tools for flood monitoring due to their potential to observe in all-weather/day-night conditions. The exploitation of SAR in conjunction with optical datasets has shown exemplary results in flood monitoring applications.

With the onset of deep learning and big data, the application of data driven approaches on training models has shown great potential in automatic flood mapping. In order to improve the efficiency of deep learning algorithms at a global scale, publicly available labelled benchmark datasets have been introduced. One of such datasets is Sen1Floods11, that includes raw Sentinel-1 imagery and classified permanent water and flood water, covering 11 flood events. The flood events had coverage from Sentinel-1 and Sentinel-2 imagery on the same day or within 2 days of the Sentinel-1 image from Aug’2016 to May’2019. The other one is WorldFloods that consists of Sentinel-2 data acquired during 119  flood events from Nov’2015 to March’2019. In this study, we make a comparative analysis to investigate the efficiency of these labelled benchmark datasets for automatic flood mapping using SAR data. Various types of flooding in different geographic locations in Europe, Australia, India and Iran  are selected and the segmentation networks are evaluated on existing Sentinel-1 images covering these events.


How to cite: Ritushree, D. K., Motagh, M., Garg, S., and Ghosh, B.: Comparative analysis of  the role of labelled benchmark datasets for automatic flood mapping using SAR data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11701,, 2022.

EGU22-12127 | Presentations | NH6.1

Methodologies for surface deformations analysis at regional scale 

Micol Fumagalli, Alberto Previati, Serena Rigamonti, Paolo Frattini, and Giovanni B. Crosta

Analysis of ground deformation is particularly demanding when displacement rates are in the range of some mm/y.  This study integrates different statistical techniques to unravel the spatial and temporal patterns of vertical ground deformation in an alluvial basin. Beyond the identification of critical areas, this is also essential to delineate a conceptual model for the uplift and subsidence mechanisms in complex environments such as a layered aquifer suffering strong piezometric oscillations and land use changes due to human activities.

The study area covers about 4000 km2 in the Lombardy region (N Italy) and includes the Milan metropolitan area and a part of the Po alluvial plain between the Como and Varese lakes. In this study, Sentinel-1A (C-band) PS-InSAR data with an average revisiting time 6 days and an average PS distance of 20 m, processed by TRE-Altamira, were analysed to investigate different movement styles in the study area.

The PS-InSAR data ranges from 2015 to 2020 and reveal a wide gently subsiding area oriented in NW-SE direction (average subsiding rate of nearly -1.5 mm/yr along the line of sight). Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were applied on ground deformation and piezometric time series, showing analogue spatial patterns of the fluctuation styles. Then, from the correlations between the spatial patterns of ground motion, groundwater level changes and geological data, and between the temporal patterns of rainfall and groundwater abstraction rates, the main causes of ground motion were identified and summarized in a conceptual model.

Finally, after reconstructing the aquifer composition and the geo-hydro-mechanical properties, and by implementing the hydraulic stresses from the conceptual model, a hydro-mechanical coupled FEM numerical model was developed. This allowed verifying the hypotheses through the comparison between the simulated ground displacement and the measured one.

How to cite: Fumagalli, M., Previati, A., Rigamonti, S., Frattini, P., and Crosta, G. B.: Methodologies for surface deformations analysis at regional scale, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12127,, 2022.

EGU22-12269 | Presentations | NH6.1 | Highlight

Time series analysis using global satellite remote sensing data archives for multi-temporal characterization of hazardous surface processes 

Sigrid Roessner, Robert Behling, Mahmud Haghshenas Haghighi, and Magdalena Vassileva

The Earth’s surface hosts a large variety of human habitats being subject to the simultaneous influence of a wide range of dynamic processes. The resulting dynamics are mainly driven by a complex interplay between geodynamic and hydrometeorological factors in combination with manifold human-induced land use changes and related impacts. The resulting effects on the Earth’s surface pose major threats to the population in these areas, especially under the conditions of increasing population pressure and further exploitation of new and remote regions accompanied by ongoing climate changes. This situation leads to significant changes in the type and dimension of natural hazards that have not yet been observed in the past in many of the affected regions.

This situation has been leading to an increasing demand for systematic and regular large area process monitoring which cannot be achieved by ground based observations alone. In this context, the potential of satellite remote sensing has already been investigated for a longer period of time as an approach for assessing dynamic processes on the Earth’s surface for large areas at different spatial and temporal scales. However, until recently these attempts have been largely hampered by the limited availability of suitable satellite remote sensing data at a global scale. During the last years new globally available satellite remote sensing data sources of high spatial and temporal resolution (e.g., Sentinels and Planet) have been increasing this potential to a large extent.

During the last decade, we have been pursuing extensive methodological developments in remote sensing based time series analysis including optical and radar observations with the goal of performing large area and at the same time detailed spatiotemporal analysis of natural hazard prone regions affected by a variety of processes, such as landslides, floods and subsidence. Our methodological developments include among others large-area automated post-failure landslide detection and mapping as well as assessment of the kinematics of pre- and post-failure slope deformation.  Our combined optical and radar remote sensing approaches aim at an improved understanding of spatiotemporal dynamics and complexities related to the evolution of these hazardous processes at different spatial and temporal scales.  We have been developing and applying our methods in a large variety of natural and societal contexts focusing on Central Asia, China and Germany.

We will present selected methodological approaches and results for a variety of hazardous surfaces processes investigated by satellite remote sensing based time series analysis. In this we will focus on the potential of our approaches for supporting the needs and requirements imposed by the disaster management cycle representing a widely used conceptual approach for disaster risk reduction and management including, rapid response, long-term preparedness and early warning.

How to cite: Roessner, S., Behling, R., Haghshenas Haghighi, M., and Vassileva, M.: Time series analysis using global satellite remote sensing data archives for multi-temporal characterization of hazardous surface processes, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12269,, 2022.

EGU22-12271 | Presentations | NH6.1 | Highlight

Deep learning, remote sensing and visual analytics to support automatic flood detection 

Binayak Ghosh, Shagun Garg, Mahdi Motagh, Daniel Eggert, Mike Sips, Sandro Martinis, and Simon Plank

Floods can have devastating consequences on people, infrastructure, and the ecosystem. Satellite imagery has proven to be an efficient instrument in supporting disaster management authorities during flood events. In contrast to optical remote sensing technology, Synthetic Aperture Radar (SAR) can penetrate clouds, and authorities can use SAR images even during cloudy circumstances. A challenge with SAR is the accurate classification and segmentation of flooded areas from SAR imagery. Recent advancements in deep learning algorithms have demonstrated the potential of deep learning for image segmentation demonstrated. Our research adopted deep learning algorithms to classify and segment flooded areas in SAR imagery. We used UNet and Feature Pyramid Network (FPN), both based on EfficientNet-B7 implementation, to detect flooded areas in SAR imaginary of Nebraska, North Alabama, Bangladesh, Red River North, and Florence. We evaluated both deep learning methods' predictive accuracy and will present the evaluation results at the conference. In the next step of our research, we develop an XAI toolbox to support the interpretation of detected flooded areas and algorithmic decisions of the deep learning methods through interactive visualizations.

How to cite: Ghosh, B., Garg, S., Motagh, M., Eggert, D., Sips, M., Martinis, S., and Plank, S.: Deep learning, remote sensing and visual analytics to support automatic flood detection, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12271,, 2022.

EGU22-12507 | Presentations | NH6.1

Spatio-temporal analysis of surface displacements in N’Djamena, Chad derived by Persistent Scatter-Interferometric Synthetic Aperture Radar (PS-InSAR) and Small BAseline Subset (SBAS) techniques 

Michelle Rygus, Giulia Tessari, Francesco Holecz, Marie-Louise Vogt, Djoret Daïra, Elisa Destro, Moussa Isseini, Giaime Origgi, Calvin Ndjoh Messina, and Claudia Meisina

High-resolution characterisation of land deformation and its spatio-temporal response to external triggering mechanisms is an important step towards improving geological hazard forecasting and management. The work presented here is part of the ResEau-Tchad project (, with a focus on the city of N’Djamena. The extraction of groundwater to sustain this rapidly growing capital city has increased the pressure on water supply and urban sanitation infrastructures which are failing to meet the current water demand. In this study we exploit Synthetic-Aperture Radar (SAR) data acquired by the Sentinel-1 satellite to investigate the temporal variability and spatial extent of land deformation to assist in the development of a sustainable water management program in N’Djamena city. 

The objectives of the work are: 1) to analyse the recent evolution of land deformation using two multi-temporal differential interferometry techniques, SBAS and PS-InSAR; and, 2) to investigate the land deformation mechanism in order to identify the factors triggering surface movements. The PS-InSAR and SBAS techniques are implemented on SAR images obtained in both ascending and descending orbits from April 2015 to May 2021 to generate high resolution deformation measurements representing the total displacement observed at the surface. While the pattern of displacement indicated by the two datasets is similar, the average velocity values obtained with PS-InSAR tend to be noisier than the ones derived using the SBAS technique, particularly when the SBAS time-series shows non-linear deformation trends.

Characterisation of the subsidence areas by means of statistical analyses are implemented to reveal the surface deformation patterns which are related to different geo-mechanical processes. The integration of the spatio-temporal distribution of PS and SBAS InSAR results with geological, hydrological, and hydrogeological data, along with subsurface lithological modelling shows a relationship between vertical displacements, clay sediments, and surface water accumulation. These areas are located mostly in the surroundings of the urban area. The city centre is observed to be mostly stable, which might be the result of the removal of the surface water through the city drainage system. Investigation of the relationship between vertical displacements and seasonal groundwater fluctuations or effects due to the groundwater withdrawal is limited due to the temporally sparse piezometric dataset; however, the recent deformation rates appear to be correlated with the groundwater level trend at some locations.

How to cite: Rygus, M., Tessari, G., Holecz, F., Vogt, M.-L., Daïra, D., Destro, E., Isseini, M., Origgi, G., Ndjoh Messina, C., and Meisina, C.: Spatio-temporal analysis of surface displacements in N’Djamena, Chad derived by Persistent Scatter-Interferometric Synthetic Aperture Radar (PS-InSAR) and Small BAseline Subset (SBAS) techniques, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12507,, 2022.

EGU22-12552 | Presentations | NH6.1

Assessment of global burned area satellite products in the African savannah 

Manuel Arbelo, Jose Rafael García-Lázaro, and Jose Andres Moreno-Ruiz

Africa is the continent with the highest annual burned area, with the African savanna being the most affected ecosystem. This paper presents an assessment of the spatio-temporal accuracy of three of the main global-scale burned area products derived from images from polar-orbiting satellite-borne sensors: 1) Fire_CCI 5. 1, of 250 m spatial resolution, developed by the European Space Agency (ESA) and led by the University of Alcalá de Henares; 2) MCD64A1 C6, of 500 m spatial resolution, developed by the University of Maryland; and 3) GABAM (Global Annual Burned Area Map), of 30 m spatial resolution, developed through the Google Earth Engine (GEE) platform by researchers from the Aerospace Information Research Institute of China. The first two products are based on daily images from the MODIS (Moderate-Resolution Imaging Spectroradiometer) sensor onboard NASA's Terra and Aqua satellites, and the third is based on Landsat images available on GEE. The almost total absence of reference burned area data from official sources has made it difficult to assess the spatio-temporal accuracy of these burned area products in Africa. However, the recent creation of the Burned Area Reference Database (BARD), which includes reference datasets from different international projects, opens the possibility for a more detailed assessment. The study focused on a region covering an area of approximately 29.5 million ha located in the southern hemisphere between 10oS and 15oS and bounded longitudinally by the 35oE and 40oE meridians. The results show that the Fire_CCI 5.1, MCD64A1 C6 and GABAM products present an annual distribution of burned area with an irregular pattern in the interval between 7 and 10 million ha per year (around 30% of the whole study area), but there is hardly any correlation between their time series, with correlation coefficients lower than 0.3 for the period 2000-2019. The spatio-temporal accuracy analysis was performed for 2005, 2010 and 2016, the only years for which BARD has reference perimeters. The results are highly variable, with values between 1 and 20 million ha per year depending on the product, the year and the reference set used, which does not allow definitive conclusions to be drawn on the accuracy of the burned area estimates. These results indicate that uncertainties persist both in the burned area estimates derived from remote sensing products in these regions and in the reference sets used for their evaluation, which require further research effort.

How to cite: Arbelo, M., García-Lázaro, J. R., and Moreno-Ruiz, J. A.: Assessment of global burned area satellite products in the African savannah, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12552,, 2022.

Many satellite images are corrupted by stripping; this noise degrades the visual quality of the images and inevitably introduces errors in processing. Thermal and hyperspectral images often suffer from stripping. The frequency distribution characteristic of stripe noise makes it difficult to remove such noise in the spatial domain; contrariwise, this noise can be efficiently detected in the frequency domain. Numerous solutions have been proposed to eliminate such noise using Fourier transform; however, most are subjective and time-consuming approaches.

The lack of a fast and automated tool in this subject has motivated us to introduce a Convolutional Neural Network-based tool that uses the U-Net architecture in the frequency domain to suppress the anomalies caused by stripe noise. We added synthetic noise to satellite images to train the model. Then, we taught the network how to mask these anomalies in the frequency domain. The input image dataset was down-sampled to a size of 128 x128 pixels for a fast training time. However, our results suggest that the output mask can be up-scaled and applied on the original Fourier transform of the image and still achieve satisfying results; this means that the proposed algorithm is applicable on images regardless of their size.

After the training step, the U-Net architecture can confidently find the anomalies and create an acceptable bounding mask; the results show that - with enough training data- the proposed procedure can efficiently remove stripe noise from all sorts of images. At this stage, we are trying to further develop the model to detect and suppress more complex synthetic noise. Next, we will focus on removing real stripe noise on satellite images to present a robust tool.

How to cite: Rangzan, M. and Attarchi, S.: Removing Stripe Noise from Satellite Images using Convolutional Neural Networks in Frequency Domain, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12575,, 2022.

EGU22-1294 | Presentations | CR2.8

What determines the location of Antarctic blue ice areas? A deep learning approach 

Veronica Tollenaar, Harry Zekollari, Devis Tuia, Benjamin Kellenberger, Marc Rußwurm, Stef Lhermitte, and Frank Pattyn

The vast majority of the Antarctic ice sheet is covered with snow that compacts under its own weight and transforms into ice below the surface. However, in some areas, this typically blue-colored ice is directly exposed at the surface. These so-called "blue ice areas" represent islands of negative surface mass balance through sublimation and/or melt. Moreover, blue ice areas expose old ice that is easily accessible in large quantities at the surface, and some areas contain ice that extends beyond the time scales of classic deep-drilling ice cores.

Observation and modeling efforts suggest that the location of blue ice areas is related to a specific combination of topographic and meteorological factors. In the literature, these factors are described as (i) enhanced katabatic winds that erode snow, due to an increase of the surface slope or a tunneling effect of topography, (ii) the increased albedo of blue ice (with respect to snow), which enhances ablative processes, and (iii) the presence of nunataks (mountains protruding the ice) that act as barriers to the ice flow upstream, and prevent deposition of blowing snow on the lee side of the mountain. However, it remains largely unknown which role the physical processes play in creating and/or maintaining  blue ice at the surface of the ice sheet.

Here, we study how a combination of environmental and topographic factors lead to the observation of blue ice. We also quantify the relevance of the single processes and build an interpretable model aiming at not only predicting blue ice presence, but also explaining why it is there. To do so, data is fed into a convolutional neural network, a machine learning algorithm which uses the spatial context of the data to generate a prediction on the presence of blue ice areas. More specifically, we use a U-Net architecture that through convolutions and linked up-convolutions allows to obtain a semantic segmentation (i.e., a pixel-level map) of the input data. Ground reference data is obtained from existing products of blue ice area outlines that are based on multispectral observations. These products contain considerable uncertainties, as (i) the horizontal change from snow to ice is gradual and a single threshold in this transition is not applicable uniformly over the continent, and (ii) the blue ice area extent is known to vary seasonally. Therefore, we train our deep learning model with a loss function with increasing weight towards the center of blue ice areas.

Our first results indicate that the neural network predicts the location of blue ice relatively well, and that surface elevation data plays an important role in determining the location of blue ice. In our ongoing work, we analyze both the predictions and the neural network itself to quantify which factors posses predictive capacity to explain the location of blue ice. Eventually this information may allow us to answer the simple yet important question of why blue ice areas are located where they are, with potentially important implications for their role as paleoclimate archives and for their evolution under changing climatic conditions.

How to cite: Tollenaar, V., Zekollari, H., Tuia, D., Kellenberger, B., Rußwurm, M., Lhermitte, S., and Pattyn, F.: What determines the location of Antarctic blue ice areas? A deep learning approach, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1294,, 2022.

EGU22-2726 | Presentations | CR2.8 | Highlight

Dissecting Glaciers - Can an Automated Bio-Medical Image Segmentation Tool also Segment Glaciers? 

Nora Gourmelon, Thorsten Seehaus, Matthias Braun, Andreas Maier, and Vincent Christlein

The temporal variability of glacier calving front positions provides essential information about the state of marine-terminating glaciers. These positions can be extracted from Synthetic Aperture Radar (SAR) images throughout the year. To automate this extraction, we apply deep learning techniques that segment the SAR images into different classes: glacier; ocean including ice-melange and sea-ice covered ocean; rock outcrop; and regions with no information like areas outside the SAR swath, layover regions and SAR shadow. The calving front position can be derived from these regions during post-processing.   
A downside of deep learning is that hyper-parameters need to be tuned manually. For this tuning, expert knowledge and experience in deep learning are required. Furthermore, the fine-tuning process takes up much time, and the researcher needs to have programming skills.
In the biomedical imaging domain, a deep learning framework [1] has become increasingly popular for image segmentation. The nnU-Net can be used out-of-the-box. It automatically adapts the U-Net, the state-of-the-art architecture for image segmentation, to different datasets and segmentation tasks. Hence, no more manual tuning is required. The framework outperforms specialized deep learning pipelines in a multitude of public biomedical segmentation competitions.   
We apply the nnU-Net to the task of glacier segmentation, investigating whether the framework is also beneficial in the domain of remote sensing. Therefore, we train and test the nnU-Net on CaFFe (, a benchmark dataset for automatic calving front detection on SAR images. CaFFe comprises geocoded, orthorectified imagery acquired by the satellite missions RADARSAT-1, ERS-1/2, ALOS PALSAR, TerraSAR-X, TanDEM-X, Envisat, and Sentinel-1, covering the period 1995 - 2020. The ground range resolution varies between 7 and 20 m2. The nnU-Net learns from the multi-class "zones" labels provided with the dataset. We adopt the post-processing scheme from Gourmelon et al. [2] to extract the front from the segmented landscape regions. The test set includes images from the Mapple Glacier located on the Antarctic Peninsula and the Columbia Glacier in Alaska. The nnU-Net's calving front predictions for the Mapple Glacier lie close to the ground truth with just 125 m mean distance error. As the Columbia Glacier shows several calving front sections, its segmentation is more difficult than that of the laterally constrained Mapple Glacier. This complexity of the calving fronts is also reflected in the results: Predictions for the Columbia Glacier show a mean distance error of 635 m. Concludingly, the results demonstrate that the nnU-Net holds considerable potential for the remote sensing domain, especially for glacier segmentation.
[1] Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021). 

[2] Gourmelon, N., Seehaus, T., Braun, M., Maier, A., Christlein, V.: Calving Fronts and Where to Find Them: A Benchmark Dataset and Methodology for Automatic Glacier Calving Front Extraction from SAR Imagery, In Prep.

How to cite: Gourmelon, N., Seehaus, T., Braun, M., Maier, A., and Christlein, V.: Dissecting Glaciers - Can an Automated Bio-Medical Image Segmentation Tool also Segment Glaciers?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2726,, 2022.

EGU22-2904 | Presentations | CR2.8

Automated mapping of Eastern Himalayan glacial lakes using deep learning and multisource remote sensing data 

Saurabh Kaushik, Tejpal Singh, Pawan Kumar Joshi, and Andreas J Dietz

The Himalayan glacierized region has experienced a substantial rise in number and area of glacial lakes in the past two decades. These glacial lakes directly influence glacier melt, velocity, geometry, and thus overall response of the glacier to climate change. The sudden release of water from these glacial lakes poses a severe threat to downstream communities and infrastructure. Thereby, regular monitoring and modelling of these lakes bear significance in order to understand regional climate change, and mitigating the anticipated impact of glacial lake outburst flood. Here, we proposed an automated scheme for Himalayan glacial lake extent mapping using multisource remote sensing data and a state-of-the-art deep learning technique. A combination of multisource remote sensing data [Synthetic Aperture Radar (SAR) coherence, thermal, visible, near-infrared, shortwave infrared, Advanced Land Observing Satellite (ALOS) DEM, surface slope and Normalised Difference Water Index (NDWI)] is used as input to a fully connected feed-forward Convolutional Neural Network (CNN). The CNN is trained on 660 images (300×300×10) collected from 11 sites spread across Himalaya. The CNN architecture is designed for choosing optimum size, number of hidden layers, convolutional layers, filters, and other hypermeters using hit and trial method. The model performance is evaluated over 3 different sites of Eastern Himalaya, representing heterogenous landscapes. The novelty of the presented automated scheme lies in its spatio-temporal transferability over the large geographical region (~8477, 10336 and 6013 km2). The future work involves Intra-annual lake extent mapping across High-Mountain Asian region in an automated fashion.

Keywords: Glacial Lake, convolutional neural network, semantic segmentation, remote sensing, Himalaya, SAR and climate change

How to cite: Kaushik, S., Singh, T., Joshi, P. K., and Dietz, A. J.: Automated mapping of Eastern Himalayan glacial lakes using deep learning and multisource remote sensing data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2904,, 2022.

EGU22-3446 | Presentations | CR2.8

The AI-CORE Project - Artificial Intelligence for Cold Regions 

Andreas Dietz and Celia Baumhoer and the AI-CORE Team

Artificial Intelligence for Cold Regions (AI-CORE) is a collaborative approach for applying Artificial Intelligence (AI) methods in the field of remote sensing of the cryosphere. Several research institutes (German Aerospace Center, Alfred-Wegener-Institute, Technical University Dresden) bundled their expertise to jointly develop AI-based solutions for pressing geoscientific questions in cryosphere research. The project addresses four geoscientific use cases such as the change pattern identification of outlet glaciers in Greenland, the object identification in permafrost areas, the detection of calving fronts in Antarctica and the firn-line detection on glaciers. Within this presentation, the four AI-based final approaches for each addressed use case will be presented and exemplary results will be shown. Further on, the implementation of all developed AI-methods in three different computer centers was realized and the lessons learned from implementing several ready-to-use AI-tools in different processing infrastructures will be discussed. Finally, a best-practice example for sharing AI-implementations between different institutes is provided along with opportunities and challenges faced during the present project duration.

How to cite: Dietz, A. and Baumhoer, C. and the AI-CORE Team: The AI-CORE Project - Artificial Intelligence for Cold Regions, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3446,, 2022.

EGU22-3701 | Presentations | CR2.8 | Highlight

Snow accumulation over the world's glaciers (1981-2021) inferred from climate reanalyses and machine learning 

Matteo Guidicelli, Marco Gabella, Matthias Huss, and Nadine Salzmann

The scarcity and limited accuracy of snow and precipitation observation and estimation in high-mountain regions reduce our understanding of climatic-cryospheric processes. Thus, we compared the snow water equivalent (SWE) from winter mass balance observations of 95 glaciers distributed over the Alps, Canada, Central Asia and Scandinavia, with the cumulative gridded precipitation data from the ERA-5 and the MERRA-2 reanalysis products. We propose a machine learning model to downscale the gridded precipitation from the reanalyses to the altitude of the glaciers. The machine learning model is a gradient boosting regressor (GBR), which combines several meteorological variables from the reanalyses (air temperature and relative humidity are also downscaled to the altitude of the glaciers) and topographical parameters. Among the most important variables selected by the GBR model, are the downscaled relative humidity and the downscaled air temperature. These GBR-derived estimates are evaluated against the winter mass balance observations by means of a leave-one-glacier-out cross-validation (site-independent GBR) and a leave-one-season-out cross-validation (season-independent GBR). The estimates downscaled by the GBR show lower biases and higher correlations with the winter mass balance observations than downscaled estimates derived with a lapse-rate-based approach. Finally, the GBR estimates are used to derive SWE trends between 1981 and 2021 at high-altitudes. The trends obtained from the GBRs are more enhanced than those obtained from the gridded precipitation of the reanalyses. When the data is regrouped regionwide, significant trends are only observed for the Alps (positive) and for Scandinavia (negative), while significant positive or negative trends are observed in all the regions when looking locally at single glaciers and specific elevations. Positive (negative) SWE trends are typically observed at higher (lower) elevations, where the impact of rising temperatures is less (more) dominating.

How to cite: Guidicelli, M., Gabella, M., Huss, M., and Salzmann, N.: Snow accumulation over the world's glaciers (1981-2021) inferred from climate reanalyses and machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3701,, 2022.

EGU22-5317 | Presentations | CR2.8

Point Mass Balance Regression using Deep Neural Networks: A Transfer Learning Approach 

Ritu Anilkumar, Rishikesh Bharti, and Dibyajyoti Chutia

The last few years have seen an increasing number of studies modeling glacier evolution using deep learning. Most of these techniques have focussed on artificial neural networks (ANN) that are capable of providing a regressed value of mass balance using topographic and meteorological input features. The large number of parameters in an ANN demands a large dataset for training the parameter values. This is relatively difficult to achieve for regions with a sparse in-situ data measurement set up such as the Himalayas. For example, of the 14326 point mass balance measurements obtained from the Fluctuations of Glaciers database for the period of 1950-2020 for glaciers between 60S and 60N, a mere 362 points over four glaciers exist for the Himalayan region. These are insufficient to train complex neural network architectures over the region. We attempt to overcome this data hurdle by using transfer learning. Here, the parameters are first trained over the 9584 points in the Alps following which the weights were used for retraining for the Himalayan data points. Fourteen meteorological from the ERA5Land monthly averaged reanalysis data were used as input features for the study. A 70-30 split of the training and testing set was maintained to ensure the authenticity of the accuracy estimates via independent testing. Estimates are assessed on a glacier scale in the temporal domain to assess the feasibility of using deep learning to fill temporal gaps in data. Our method is also compared with other machine learning algorithms such as random forest-based regression and support vector-based regression and we observe that the complexity of the dataset is better represented by the neural network architecture. With an overall normalized root mean squared loss consistently less than 0.09, our results suggest the capability of deep learning to fill the temporal data gaps over the glaciers and potentially reduce the spatial gap on a regional scale.

How to cite: Anilkumar, R., Bharti, R., and Chutia, D.: Point Mass Balance Regression using Deep Neural Networks: A Transfer Learning Approach, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5317,, 2022.

EGU22-5612 | Presentations | CR2.8

Retrieving freeze/thaw-cycles using Machine Learning approach in Nunavik (Québec, Canada) 

Yueli Chen, Lingxiao Wang, Monique Bernier, and Ralf Ludwig

In the terrestrial cryosphere, freeze/thaw (FT) state transition plays an important and measurable role for climatic, hydrological, ecological, and biogeochemical processes in permafrost landscapes.

Satellite active and passive microwave remote sensing has shown its principal capacity to provide effective monitoring of landscape FT dynamics. Many algorithms have been developed and evaluated over time in this scope. With the advancement of data science and artificial intelligence methods, the potential of better understanding the cryosphere is emerging.

This work is dedicated to exploring an effective approach to retrieve FT state based on microwave remote sensing data using machine learning methods, which is expected to fill in some hidden blind spots in the deterministic algorithms. Time series of remote sensing data will be created as training data. In the initial stage, the work aims to test the feasibility and establish the basic neural network based on fewer training factors. In the advanced stage, we will improve the model in terms of structure, such as adding more complex dense layers and testing optimizers, and in terms of discipline, such as introducing more influencing factors for training. Related parameters, for example, land cover types, will be included in the analysis to improve the method and understanding of FT-related processes.

How to cite: Chen, Y., Wang, L., Bernier, M., and Ludwig, R.: Retrieving freeze/thaw-cycles using Machine Learning approach in Nunavik (Québec, Canada), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5612,, 2022.

EGU22-5910 | Presentations | CR2.8

Learning and screening of neural networks architectures for sub-grid-scale parametrizations of sea-ice dynamics from idealised twin experiments 

Tobias Finn, Charlotte Durand, Alban Farchi, Marc Bocquet, Yumeng Chen, Alberto Carrassi, and Veronique Dansereau

In this talk, we propose to use neural networks in a hybrid modelling setup to learn sub-grid-scale dynamics of sea-ice that cannot be resolved by geophysical models. The multifractal and stochastic nature of the sea-ice dynamics create significant obstacles to represent such dynamics with neural networks. Here, we will introduce and screen specific neural network architectures that might be suited for this kind of task. To prove our concept, we perform idealised twin experiments in a simplified Maxwell-Elasto-Brittle sea-ice model which includes only sea-ice dynamics within a channel-like setup. In our experiments, we use high-resolution runs as proxy for the reality, and we train neural networks to correct errors of low-resolution forecast runs.

Since we perform the two kind of runs on different grids, we need to define a projection operator from high- to low-resolution. In practice, we compare the low-resolution forecasted state at a given time to the projected state of the high resolution run at the same time. Using a catalogue of these forecasted and projected states, we will learn and screen different neural network architectures with supervised training in an offline learning setting. Together with this simplified training, the screening helps us to select appropriate architectures for the representation of multifractality and stochasticity within the sea-ice dynamics. As a next step, these screened architectures have to be scaled to larger and more complex sea-ice models like neXtSIM.

How to cite: Finn, T., Durand, C., Farchi, A., Bocquet, M., Chen, Y., Carrassi, A., and Dansereau, V.: Learning and screening of neural networks architectures for sub-grid-scale parametrizations of sea-ice dynamics from idealised twin experiments, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5910,, 2022.

EGU22-6948 | Presentations | CR2.8

Mapping Glacier Basal Sliding with Beamforming and Artificial Intelligence 

Josefine Umlauft, Philippe Roux, Albanne Lecointre, Florent Gimbert, Ugo Nanni, Andrea Walpersdorf, Bertrand Rouet-LeDuc, Claudia Hulbert, Daniel Trugman, and Paul Johnson

The cryosphere is a highly active and dynamic environment that rapidly responds to changing climatic conditions. In particular, the physical processes behind glacial dynamics are poorly understood because they remain challenging to observe. Glacial dynamics are strongly intermittent in time and heterogeneous in space. Thus, monitoring with high spatio-temporal resolution is essential.

In course of the RESOLVE (‘High-resolution imaging in subsurface geophysics : development of a multi-instrument platform for interdisciplinary research’) project, continuous seismic observations were obtained using a dense seismic network (100 nodes, Ø 700 m) installed on Glacier d’Argentière (French Alpes) during May in 2018. This unique data set offers the chance to study targeted processes and dynamics within the cryosphere on a local scale in detail.


To identify seismic signatures of ice beds in the presence of melt-induced microseismic noise, we applied the supervised ML technique gradient tree boosting. The approach has been proven suitable to directly observe the physical state of a tectonic fault. Transferred to glacial settings, seismic surface records could therefore reveal frictional properties of the ice bed, offering completely new means to study the subglacial environment and basal sliding, which is difficult to access with conventional approaches.

We built our ML model as follows: Statistical properties of the continuous seismic records (variance, kurtosis and quantile ranges), meteorological data and a seismic source catalogue obtained using beamforming (matched field processing) serve as features which we fit to measures of the GPS displacement rate of Glacier d’Argentière (labels). Our preliminary results suggest that seismic source activity at the bottom of the glacier strongly correlates with surface displacement rates and hence, is directly linked to basal motion. By ranking the importance of our input features, we have learned that other than for reasonably long monitoring time series along tectonic faults, statistical properties of seismic observations only do not suffice in glacial environments to estimate surface displacement. Additional beamforming features however, are a rich archive that enhance the ML model performance considerably and allow to directly observe ice dynamics.

How to cite: Umlauft, J., Roux, P., Lecointre, A., Gimbert, F., Nanni, U., Walpersdorf, A., Rouet-LeDuc, B., Hulbert, C., Trugman, D., and Johnson, P.: Mapping Glacier Basal Sliding with Beamforming and Artificial Intelligence, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6948,, 2022.

EGU22-8945 | Presentations | CR2.8

Ice Lead Network Analysis 

Julia Kaltenborn, Venkatesh Ramesh, and Thomas Wright

Ice lead analysis is an essential task for evaluating climate change processes in the Arctic. Ice leads are narrow cracks in the sea-ice, which build a complex network. While detecting and modeling ice leads has been performed in numerous ways based on airborne images, the dynamics of ice leads over time remain hidden and largely unexplored. These dynamics could be analyzed by interpreting the ice leads as more than just airborne images, but as what they really are: a dynamic network. The lead’s start, end, and intersection points can be considered nodes, and the leads themselves as edges of a network. As the nodes and edges change over time, the ice lead network is constantly evolving. This new network perspective on ice leads could be of great interest for the cryospheric science community since it opens the door to new methods. For example, adapting common link prediction methods might make data-driven ice lead forecasting and tracking feasible.
To reveal the hidden dynamics of ice leads, we performed a spatio-temporal and network analysis of ice lead networks. The networks used and presented here are based on daily ice lead observations from Moderate Resolution Imaging Spectroradiometer (MODIS) between 2002 and 2020 by Hoffman et al. [1].
The spatio-temporal analysis of the ice leads exhibits seasonal, annual, and overall trends in the ice lead dynamics. We found that the number of ice leads is decreasing, and the number of width and length outliers is increasing overall. The network analysis of the ice lead graphs reveals unique network characteristics that diverge from those present in common real-world networks. Most notably, current network science methods (1) exploit the information that is embedded into the connections of the network, e.g., in connection clusters, while (2) nodes remain relatively fixed over time. Ice lead networks, however, (1) embed their relevant information spatially, e.g., in spatial clusters, and (2) shift and change drastically. These differences require improvements and modifications on common graph classification and link prediction methods such as Preferential Attachment and EvolveGCN on the domain of ice lead dynamic networks.
This work is a call for extending existing network analysis toolkits to include a new class of real-world dynamic networks. Utilizing network science techniques will hopefully further our understanding of ice leads and thus of Arctic processes that are key to climate change mitigation and adaptation.


We would like to thank Prof. Gunnar Spreen, who provided us insights into ice lead detection and possible challenges connected to the project idea. Furthermore, we would like to thank Shenyang Huang and Asst. Prof. David Rolnick for their valuable feedback and support. J.K. was supported in part by the DeepMind scholarship, the Mitacs Globalink Graduate Fellowship, and the German Academic Scholarship Foundation.


[1] Jay P Hoffman, Steven A Ackerman, Yinghui Liu, and Jeffrey R Key. 2019. The detection and characterization of Arctic sea ice leads with satellite imagers. Remote Sensing 11, 5 (2019), 521.

How to cite: Kaltenborn, J., Ramesh, V., and Wright, T.: Ice Lead Network Analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8945,, 2022.

EGU22-9753 | Presentations | CR2.8

Using LSTM on surface data to reconstruct 3D Temperature & Salinity profiles in the Arctic Ocean 

Mathias Jensen, Casper Bang-Hansen, Ole Baltazar Andersen, Carsten Bjerre Ludwigsen, and Mads Ehrhorn

In recent years, the importance of dynamics in the Arctic Ocean have proven itself with respect to climate monitoring and modelling. Data used for creating models often include temperature & salinity profiles. Such profiles in the Arctic region are sparse and acquiring new data is expensive and time-consuming. Thus, efficient methods of interpolation are necessary to expand regional data. In this project, 3D temperature & salinity profiles are reconstructed using 2D surface measurements from ships, floats and satellites. The technique is based on a stacked Long Short-Term Memory (LSTM) neural network. The goal is to be able to reconstruct the profiles using remotely sensed data.

How to cite: Jensen, M., Bang-Hansen, C., Andersen, O. B., Ludwigsen, C. B., and Ehrhorn, M.: Using LSTM on surface data to reconstruct 3D Temperature & Salinity profiles in the Arctic Ocean, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9753,, 2022.

EGU22-10386 | Presentations | CR2.8

Arctic sea ice dynamics forecasting through interpretable machine learning 

Matteo Sangiorgio, Elena Bianco, Doroteaciro Iovino, Stefano Materia, and Andrea Castelletti

Machine Learning (ML) has become an increasingly popular tool to model the evolution of sea ice in the Arctic region. ML tools produce highly accurate and computationally efficient forecasts on specific tasks. Yet, they generally lack physical interpretability and do not support the understanding of system dynamics and interdependencies among target variables and driving factors.

Here, we present a 2-step framework to model Arctic sea ice dynamics with the aim of balancing high performance and accuracy typical of ML and result interpretability. We first use time series clustering to obtain homogeneous subregions of sea ice spatiotemporal variability. Then, we run an advanced feature selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), to process the sea ice time series barycentric of each cluster. W-QEISS identifies neural predictors (i.e., extreme learning machines) of the future evolution of the sea ice based on past values and returns the most relevant set of input variables to describe such evolution.

Monthly output from the Pan-Arctic Ice-Ocean Modeling and Assimilation System (PIOMAS)  from 1978 to 2020 is used for the entire Arctic region. Sea ice thickness represents the target of our analysis, while sea ice concentration, snow depth, sea surface temperature and salinity are considered as candidate drivers.

Results show that autoregressive terms have a key role in the short term (with lag time 1 and 2 months) as well as the long term (i.e., in the previous year); salinity along the Siberian coast is frequently selected as a key driver, especially with a one-year lag; the effect of sea surface temperature is stronger in the clusters with thinner ice; snow depth is relevant only in the short term.

The proposed framework is an efficient support tool to better understand the physical process driving the evolution of sea ice in the Arctic region.

How to cite: Sangiorgio, M., Bianco, E., Iovino, D., Materia, S., and Castelletti, A.: Arctic sea ice dynamics forecasting through interpretable machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10386,, 2022.

EGU22-10637 | Presentations | CR2.8

A deep learning approach for mapping and monitoring glacial lakes from space 

Manu Tom, Holger Frey, and Daniel Odermatt

Climate change intensifies glacier melt which effectively leads to the formation of numerous new glacial lakes in the overdeepenings of former glacier beds. Additionally, the area of many existing glacial lakes is increasing. More than one thousand glacial lakes have emerged in Switzerland since the Little Ice Age, and hundreds of lakes are expected to form in the 21st century. Rapid deglaciation and formation of new lakes severely affect downstream ecosystem services, hydropower production and high-alpine hazard situations. Day by day, glacier lake inventories for high-alpine terrains are increasingly becoming available to the research community. However, a high-frequency mapping and monitoring of these lakes are necessary to assess hazards and to estimate Glacial Lake Outburst Flood (GLOF) risks, especially for lakes with high seasonal variations. One way to achieve this goal is to leverage the possibilities of satellite-based remote sensing, using optical and Synthetic Aperture Radar (SAR) satellite sensors and deep learning.

There are several challenges to be tackled. Mapping glacial lakes using satellite sensors is difficult, due to the very small area of a great majority of these lakes. The inability of the optical sensors (e.g. Sentinel-2) to sense through clouds creates another bottleneck. Further challenges include cast and cloud shadows, and increased levels of lake and atmospheric turbidity. Radar sensors (e.g. Sentinel-1 SAR) are unaffected by cloud obstruction. However, handling cast shadows and natural backscattering variations from water surfaces are hurdles in SAR-based monitoring. Due to these sensor-specific limitations, optical sensors provide generally less ambiguous but temporally irregular information, while SAR data provides lower classification accuracy but without cloud gaps.

We propose a deep learning-based SAR-optical satellite data fusion pipeline that merges the complementary information from both sensors. We put forward to use Sentinel-1 SAR and Sentinel-2 L2A imagery as input to a deep network with a Convolutional Neural Network (CNN) backbone. The proposed pipeline performs a fusion of information from the two input branches that feed heterogeneous satellite data. A shared block learns embeddings (feature representation) invariant to the input satellite type, which are then fused to guide the identification of glacial lakes. Our ultimate aim is to produce geolocated maps of the target regions where the proposed bottom-up, data-driven methodology will classify each pixel either as lake or background.

This work is part of two major projects: ESA AlpGlacier project that targets mapping and monitoring of the glacial lakes in the Swiss (and European) Alps, and the UNESCO (Adaptation Fund) GLOFCA project that aims to reduce the vulnerabilities of populations in the Central Asian countries (Kazakhstan, Tajikistan, Uzbekistan, and Kyrgyzstan) from GLOFs in a changing climate. As part of the GLOFCA project, we are developing a python-based analytical toolbox for the local authorities, which incorporates the proposed deep learning-based pipeline for mapping and monitoring the glacial lakes in the target regions in Central Asia.

How to cite: Tom, M., Frey, H., and Odermatt, D.: A deep learning approach for mapping and monitoring glacial lakes from space, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10637,, 2022.

EGU22-12785 | Presentations | CR2.8

Machine learning tools for pattern recognition in polar climate science 

William Gregory

Over the past four decades, the inexorable growth in technology and subsequently the availability of Earth-observation and model data has been unprecedented. Hidden within these data are the fingerprints of the physical processes that govern climate variability over a wide range of spatial and temporal scales, and it is the task of the climate scientist to separate these patterns from noise. Given the wealth of data now at our disposal, machine learning methods are becoming the tools of choice in climate science for a variety of applications ranging from data assimilation, to sea ice feature detection from space. This talk summarises recent developments in the application of machine learning methods to the study of polar climate, with particular focus on Arctic sea ice. Supervised learning techniques including Gaussian process regression, and unsupervised learning techniques including cluster analysis and complex networks, are applied to various problems facing the polar climate community at present, where each application can be considered an individual component of the larger sea ice prediction problem. These applications include: seasonal sea ice forecasting, improving spatio-temporal data coverage in the presence of sparse satellite observations, and illuminating the spatio-temporal connectivity between climatological processes.

How to cite: Gregory, W.: Machine learning tools for pattern recognition in polar climate science, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12785,, 2022.

EGU22-12882 | Presentations | CR2.8

Inverse modelling techniques for snow and ice thickness retrievals from satellite altimetry  

Joel Perez Ferrer, Michel Tsamados, Matthew Fox, Tudor Suciu, Harry Heorton, and Carmen Nab

We have recently applied an objective mapping type approach to merge observations from multiple altimeters, both for enhancing the temporal/spatial resolution of freeboard samples and for analyzing crossovers between satellites (Gregory et al, 2021). This mapping provides optimal interpolation of proximal observations to a location in space and time based on the covariance of the observations and a priori understanding of their spatiotemporal correlation length scales. This offers a best linear estimator and error field for the observation (radar freeboard or snow depth), which can be used to better constrain pan-Arctic uncertainties. 


In addition we will explore here a newly developed inverse modelling framework  to synchronously retrieve the snow and ice thickness from bias corrected or calibrated radar freeboards from multiple satellite retrievals. The radar equations expressed in section can be rearranged to formulate the joint forward model at gridded level relating measured radar freeboards from multiple satellites (and airborne data) to the underlying snow and ice thickness. In doing so we have also introduced a penetration factor correction term for OIB radar freeboard measurements. To solve this inverse model problem for  and  we use the following two methodologies inspired from Earth Sciences applications (i.e. seismology):  


Space ‘uncorrelated’ inverse modelling. The method is called `space uncorrelated' inverse modelling as the algorithm is applied locally, for small distinct regions in the Arctic Ocean, multiple times, until the entire Arctic ocean is covered. To sample the parameter space  we use the publicly available Neighbourhoud Algorithm (NA) developed originally for seismic tomography of Earth’s interior and recently by us to a sea ice dynamic inversion problem (Hoerton et al, 2019).   


Space ‘correlated inverse modelling. For the second method of inverse modelling, we used what we call a `space correlated' approach. Here the main algorithm is applied over the entire Arctic region, aiming to retrieve the desired parameters at once. In contrast with the previous approach, in this method we take into account positional correlations for the physical parameters when we are solving the inverse problem, the output being a map of the Arctic composed of a dynamically generated a tiling in terms of Voronoi cells. In that way, regions with less accurate observations will be more coarsely resolved while highly sampled regions will be provided on a finer grid with a smaller uncertainty. The main algorithm used here to calculate the posterior solution is called `reverse jump Monte Carlo Markov Chain' (hereafter referred to as rj-MCMC) and its concept was designed by Peter Green in 1999 (Green, 1995). Bodin and Sambridge (2009) adapted this algorithm for seismic inversion, which is the basis of the algorithm used in this study.  


How to cite: Perez Ferrer, J., Tsamados, M., Fox, M., Suciu, T., Heorton, H., and Nab, C.: Inverse modelling techniques for snow and ice thickness retrievals from satellite altimetry , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12882,, 2022.

EGU22-91 | Presentations | NP4.1

The role of teleconnections in complex climate network 

Ruby Saha

A complex network provides a robust framework to statistically investigate the topology of local and long-range connections, i.e., teleconnections in climate dynamics. The Climate network is constructed from meteorological data set using the linear Pearson correlation coefficient to measure similarity between two regions. Long-range teleconnections connect remote geographical sites and are crucial for climate networks. In this study, we discuss that during El Ni\~no Southern Oscillation onset, the teleconnections pattern changes according to the episode's strength. The long-range teleconnections are significant and responsible for the episodes' extremum ONI attained gradually after onset. We quantify the betweenness centrality measurement and note that the teleconnection distribution pattern and the betweenness measurements fit well.

How to cite: Saha, R.: The role of teleconnections in complex climate network, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-91,, 2022.

EGU22-1831 | Presentations | NP4.1

Quantifying space-weather events using dynamical network analysis of Pc waves with global ground based magnetometers. 

Shahbaz Chaudhry, Sandra Chapman, Jesper Gjerloev, Ciaran Beggan, and Alan Thompson

Geomagnetic storms can impact technological systems, on the ground and in space, including damage to satellites and power blackouts. Their impact on ground systems such as power grids depends upon the spatio-temporal extent and time-evolution of the ground magnetic perturbation driven by the storm.

Pc waves are Alfven wave resonances of closed magnetospheric field lines and are ubiquitous in the inner magnetosphere. They have been extensively studied, in particular since  Pc wave power tracks the onset and evolution of geomagnetic storms.  We study the spatial and temporal evolution of Pc waves with a network analysis of the 100+ ground-based magnetometer stations collated by the SuperMAG collaboration with a single time-base and calibration. 

Network-based analysis of 1 min cadence SuperMAG magnetometer data has been applied to the dynamics of substorm current systems (Dods et al. JGR 2015, Orr et al. GRL 2019) and the magnetospheric response to IMF turnings (Dods et al. JGR 2017). It has the potential to capture the full spatio-temporal response with a few time-dependent network parameters. Now, with the availability of 1 sec data across the entire SuperMAG network we are able for the first time to apply network analysis globally to resolve both the spatial and temporal correlation patterns of the ground signature of Pc wave activity as a geomagnetic storm evolves. We focus on Pc2 (5-10s period) and Pc3 (10-45s period) wave bands. We obtain the time-varying global Pc wave dynamical network over individual space weather events.

To construct the networks we sample each magnetometer time series with a moving window in the time domain (20 times Pc period range) and then band-pass filter each magnetometer station time-series to obtain Pc2 and Pc3 waveforms. We then compute the cross correlation (TLXC) between all stations for each Pc band. Modelling is used to determine a threshold of significant TLXC above which a pair of stations are connected in the network. The TLXC as a function of lag is tested against a criterion for sinusoidal waveforms and then used to calculate the phase difference. The connections with a TLXC peak at non zero lag form a directed network which characterizes propagation or information flow. The connections at TLXC lag peak close to zero form am undirected network which characterizes a response which is globally instantaneously coherent.

We apply this network analysis to isolated geomagnetic storms. We find that the network connectivity does not simply track Pc wave power, it therefore contains additional information. Geographically short range connections are prevalent at all times, the storm onset marks a transition to a network which has both enhancement of geographically short-range connections, and the growth of geographically long range, global scale, connections extending spatially over a region exceeding 9h MLT. These global scale connections, indicating globally coherent Pc wave response are prevalent throughout the storm with considerable (within a few time windows) variation. The stations are not uniformly distributed spatially. Therefore, we distinguish between long range connections to avoid introducing spatial correlation. 

How to cite: Chaudhry, S., Chapman, S., Gjerloev, J., Beggan, C., and Thompson, A.: Quantifying space-weather events using dynamical network analysis of Pc waves with global ground based magnetometers., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1831,, 2022.

EGU22-2014 | Presentations | NP4.1

OBS noise reduction using music information retrieval algorithms 

Zahra Zali, Theresa Rein, Frank Krüger, Matthias Ohrnberger, and Frank Scherbaum

Since the ocean covers 71% of the Earth’s surface, records from ocean bottom seismometers (OBS) are essential for investigating the whole Earth’s structure. However, data from ocean bottom recordings are commonly difficult to analyze due to the high noise level especially on the horizontal components. In addition, signals of seismological interest such as earthquake recordings at teleseismic distances, are masked by the oceanic noises. Therefore, noise reduction of OBS data is an important task required for the analysis of OBS records. Different approaches have been suggested in previous studies to remove noise from vertical components successfully, however, noise reduction on records of horizontal components remained problematic. Here we introduce a method, which is based on harmonic-percussive separation (HPS) algorithms used in Zali et al., (2021) that is able to separate long-lasting narrowband signals from broadband transients in the OBS records. In the context of OBS noise reduction using HPS algorithms, percussive components correspond to earthquake signals and harmonic components correspond to noise signals. OBS noises with narrowband horizontal structures in the short time Fourier transform (STFT) are readily distinguishable from transient, short-duration seismic events with vertical exhibitions in the STFT spectrogram. Through HPS algorithms we try to separate horizontal structures from vertical structures in the STFT spectrograms. Using this method we can reduce OBS noises from both vertical and horizontal components, retrieve clearer broadband earthquake waveforms and increase the earthquake signal to noise ratio. The applicability of the method is checked through tests on synthetic and real data.

How to cite: Zali, Z., Rein, T., Krüger, F., Ohrnberger, M., and Scherbaum, F.: OBS noise reduction using music information retrieval algorithms, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2014,, 2022.

EGU22-2097 | Presentations | NP4.1 | Highlight

Medium- to long-term forecast of sea surface temperature using EEMD-STEOF-LSTM hybrid model 

Rixu Hao, Yuxin Zhao, Xiong Deng, Di Zhou, Dequan Yang, and Xin Jiang

Sea surface temperature (SST) is a vitally important variable of the global ocean, which can profoundly affect the climate and marine ecosystems. The field of forecasting oceanic variables has traditionally relied on numerical models, which effectively consider the discretization of the dynamical and physical oceanic equations. However, numerical models suffer from many limitations such as short timeliness, complex physical processes, and excessive calculation. Furthermore, existing machine learning has been proved to be able to capture spatial and temporal information independently without these limitations, but the previous research on multi-scale feature extraction and evolutionary forecast under spatiotemporal integration is still inadequate. To fill this gap, a multi-scale spatiotemporal forecast model is developed combining ensemble empirical mode decomposition (EEMD) and spatiotemporal empirical orthogonal function (STEOF) with long short-term memory (LSTM), which is referred to as EEMD-STEOF-LSTM. Specifically, the EEMD is applied for adaptive multi-scale analysis; the STEOF is adopted to decompose the spatiotemporal processes of different scales into terms of a sum of products of spatiotemporal basis functions along with corresponding coefficients, which captures the evolution of spatial and temporal processes simultaneously; and the LSTM is employed to achieve medium- to long-term forecast of STEOF-derived spatiotemporal coefficients. A case study of the daily average of SST in the South China Sea shows that the proposed hybrid EEMD-STEOF-LSTM model consistently outperforms the optimal climatic normal (OCN), STEOF, and STEOF-LSTM, which can accurately forecast the characteristics of oceanic eddies. Statistical analysis of the case study demonstrates that this model has great potential for practical applications in medium- to long-term forecast of oceanic variables.

How to cite: Hao, R., Zhao, Y., Deng, X., Zhou, D., Yang, D., and Jiang, X.: Medium- to long-term forecast of sea surface temperature using EEMD-STEOF-LSTM hybrid model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2097,, 2022.

In this presentation, we introduce the IMFogram method ( pronounced like "infogram" ), which is a new, fast, local, and reliable time-frequency representation (TFR) method for nonstationary signals. This technique is based on the Intrinsic Mode Functions (IMFs) decomposition produced by a decomposition method, like the Empirical Mode Decomposition-based techniques, Iterative Filtering-based algorithms, or any equivalent method developed so far. We present the mathematical properties of the IMFogram, and show the proof that this method is a generalization of the Spectrogram. We conclude the presentation with some applications, as well as a comparison of its performance with other existing TFR techniques.

How to cite: Cicone, A.: The IMFogram: a new time-frequency representation algorithm for nonstationary signals, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2560,, 2022.

EGU22-2922 | Presentations | NP4.1

Constraining the uncertainty in CO2 seasonal cycle metrics by residual bootstrapping. 

Theertha Kariyathan, Wouter Peters, Julia Marshall, Ana Bastos, and Markus Reichstein

The analysis of long, high-quality time series of atmospheric greenhouse gas measurements helps to quantify their seasonal to interannual variations and impact on global climate. These discrete measurement records contain, however, gaps and at times noisy data, influenced by local fluxes or synoptic scale events, hence appropriate filtering and curve-fitting techniques are often used to smooth and gap-fill the atmospheric time series. Previous studies have shown that there is an inherent uncertainty associated with curve-fitting processes which introduces biases based on the choice of mathematical method used for data processing and can lead to scientific misinterpretation of the signal. Further the uncertainties in curve-fitting can be propagated onto the metrics estimated from the fitted curve that could significantly influence the quantification of the metrics and their interpretations. In this context we present a novel-methodology for constraining the uncertainty arising from fitting a smooth curve to the CO2 dry air mole fraction time-series, and propagate this uncertainty onto commonly used metrics to study the seasonal cycle of CO2. We generate an ensemble of fifitted curves from the data using residual bootstrap sampling with loess-fitted residuals, that is representative of the inherent uncertainty in applying the curve-fitting method to the discrete data. The spread of the selected CO2 seasonal cycle metrics across bootstrap time-series provides an estimate of the inherent uncertainty in curve fitting to the discrete data. Further we show that the approach can be extended to other curve-fitting methods by generating multiple bootstrap samples by resampling residuals obtained from processing the data using the widely used CCGCRV filtering method by the atmospheric greenhouse gas measurement community.

How to cite: Kariyathan, T., Peters, W., Marshall, J., Bastos, A., and Reichstein, M.: Constraining the uncertainty in CO2 seasonal cycle metrics by residual bootstrapping., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2922,, 2022.

EGU22-4795 | Presentations | NP4.1

Robust Causal Inference for Irregularly Sampled Time Series: Applications in Climate and Paleoclimate Data Analysis 

Aditi Kathpalia, Pouya Manshour, and Milan Paluš

To predict and determine the major drivers of climate has become even more important now as climate change poses a big challenge to humankind and our planet earth. Different studies employ either correlation, causality methods or modelling approaches to study the interaction between climate and climate forcing variables (anthropogenic or natural). This includes the study of interaction between global surface temperatures and CO2; rainfall in different locations and El Niño–Southern Oscillation (ENSO) phenomena. The results produced by different studies have been found to be different and debatable, presenting an ambiguous situation. In this work, we develop and apply a novel robust causality estimation technique for time-series data (to estimate causal influence between given observables), that can help to resolve the ambiguity. The discrepancy in existing results arises due to challenges with the acquired data and limitations of the causal inference/ modelling approaches. Our novel approach combines the use of a recently proposed causality method, Compression-Complexity Causality (CCC) [1], and Ordinal/ Permutation pattern-based coding [2]. CCC estimates have been shown to be robust for bivariate systems with low temporal resolution, missing samples, long-term memory and finite length data [1]. The use of ordinal patterns helps to extend bivariate CCC to the multivariate case by capturing the multidimensional dynamics of the given variables’ systems in the symbolic temporal sequence of a single variable. This methodology is tested on dynamical systems data which are short in length and have been corrupted with missing samples or subsampled to different levels. The superior performance of ‘Permutation CCC’ on such data relative to other causality estimation methods, strengthens our trust in the method. We apply the method to study the interaction between CO2-temperature recordings on three different time scales, CH4-temperature on the paleoclimate scale, ENSO-South Asian monsoon on monthly and yearly time scales, North Atlantic Oscillation-surface temperature on daily and monthly time scales. These datasets are either short in length, have been sampled irregularly, have missing samples or have a combination of the above factors. Our results are interesting, which validate some existing studies while contradicting others. In addition, the development of the novel permutation-CCC approach opens the possibility of its application for making useful inferences on other challenging climate datasets.

This study is supported by the Czech Science Foundation, Project No.~GA19-16066S and by the Czech Academy of Sciences, Praemium Academiae awarded to M. Paluš.

[1] Kathpalia, A., & Nagaraj, N. (2019). Data-based intervention approach for Complexity-Causality measure. PeerJ Computer Science, 5, e196.
[2] Bandt, C., & Pompe, B. (2002). Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17), 174102.

How to cite: Kathpalia, A., Manshour, P., and Paluš, M.: Robust Causal Inference for Irregularly Sampled Time Series: Applications in Climate and Paleoclimate Data Analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4795,, 2022.

Rainfall time series prediction is crucial for geoscientific system monitoring, but it is challenging and complex due to the extreme variability of rainfall. In order to improve prediction accuracy, a hybrid deep learning model (VMD-RNN) was proposed. In this study, variational mode decomposition (VMD) is first applied to decompose the original rainfall time series into several sub-sequences according to the frequency domain. Following that, different recurrent neural network (RNN) models are utilized to predict individual sub-sequences and the final prediction is reconstructed by summing the prediction results of sub-sequences. These RNN models are long short-term memory (LSTM), gated recurrent unit (GRU), bidirectional LSTM (BiLSTM) and bidirectional GRU (BiGRU), which are optimal for sequence prediction. The root mean square error (RMSE) of the predicted performance is then used to select the ideal RNN model for each sub-sequences. In addition to RMSE, the framework of universal multifractal (UM) is also introduced to evaluate prediction performances, which enables to characterize the extreme variability of predicted rainfall time series. The study employed two rainfall datasets from 2001 to 2020 in Paris, with daily and hourly resolutions. The results show that, when compared to directly predicting the original time series, the proposed hybrid VMD-RNN model improves prediction of high or extreme values for the daily dataset, but does not significantly enhance the prediction of zero or low values. Additionally, the VMD-RNN model also outperforms existing deep learning models without decomposition on the hourly dataset when evaluated with the help of RMSE, while universal multifractal analyses point out limitations. 

How to cite: Zhou, H., Schertzer, D., and Tchiguirinskaia, I.: Combining variational mode decomposition and recurrent neural network to predict rainfall time series and evaluating prediction performance by universal multifractals, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6014,, 2022.

EGU22-6281 | Presentations | NP4.1

Application of information theoretical measures for improved machine learning modelling of the outer radiation belt 

Constantinos Papadimitriou, Georgios Balasis, Ioannis A. Daglis, and Simon Wing

In the past ten years Artificial Neural Networks (ANN) and other machine learning methods have been used in a wide range of models and predictive systems, to capture and even predict the onset and evolution of various types of phenomena. These applications typically require large datasets, composed of many variables and parameters, the number of which can often make the analysis cumbersome and prohibitively time consuming, especially when the interplay of all these parameters is taken into consideration. Thankfully, Information-Theoretical measures can be used to not only reduce the dimensionality of the input space of such a system, but also improve its efficiency. In this work, we present such a case, where differential electron fluxes from the Magnetic Electron Ion Spectrometer (MagEIS) on board the Van Allen Probes satellites are modelled by a simple ANN, using solar wind parameters and geomagnetic activity indices as inputs, and illustrate how the proper use of Information Theory measures can improve the efficiency of the model by minimizing the number of input parameters and shifting them with respect to time, to their proper time-lagged versions.

How to cite: Papadimitriou, C., Balasis, G., Daglis, I. A., and Wing, S.: Application of information theoretical measures for improved machine learning modelling of the outer radiation belt, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6281,, 2022.

EGU22-7256 | Presentations | NP4.1

Identifying patterns of teleconnections, a curvature-based network analysis 

Jakob Schlör, Felix M. Strnad, Christian Fröhlich, and Bedartha Goswami

Representing spatio-temporal climate variables as complex networks allows uncovering nontrivial structure in the data. Although various tools for detecting communities in climate networks have been used to group nodes (spatial locations) with similar climatic conditions, we are often interested in identifying important links between communities. Of particular interest are methods to detect teleconnections, i.e. links over large spatial distances mitigated by atmospheric processes.

We propose to use a recently developed network measure based on Ricci-curvature to visualize teleconnections in climate networks. Ricci-curvature allows to distinguish between- and within-community links in networks. Applied to networks constructed from surface temperature anomalies we show that Ricci-curvature separates spatial scales. We use Ricci-curvature to study differences in global teleconnection patterns of different types of El Niño events, namely the Eastern Pacific (EP) and Central Pacific (CP) types. Our method reveals a global picture of teleconnection patterns, showing confinement of teleconnections to the tropics under EP conditions but showing teleconnections to the tropics, Northern and Southern Hemisphere under CP conditions. The obtained teleconnections corroborate previously reported impacts of EP and CP.
Our results suggest that Ricci-curvature is a promising visual-analytics-tool to study the topology of climate systems with potential applications across observational and model data.

How to cite: Schlör, J., Strnad, F. M., Fröhlich, C., and Goswami, B.: Identifying patterns of teleconnections, a curvature-based network analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7256,, 2022.

EGU22-8399 | Presentations | NP4.1

Using neural networks to detect coastal hydrodynamic phenomena in high-resolution tide gauge data 

Felix Soltau, Sebastian Niehüser, and Jürgen Jensen

Tide gauges are exposed to various kinds of influences that are able to affect water level measurements significantly and lead to time series containing different phenomena and artefacts. These influences can be natural or anthropogenic, while both lead to actual changes of the water level. Opposed to that, technical malfunction of measuring devices as another kind of influence causes non-physical water level data. Both actual and non-physical data need to be detected and classified consistently, and possibly corrected to enable the supply of adequate water level information. However, there is no automatically working detection algorithm yet. Only obvious or frequent technical malfunctions like gaps can be detected automatically but have to be corrected manually by trained staff. Consequently, there is no consistently defined data pre-processing before, for example, statistical analyses are performed or water level information for navigation is passed on.

In the research project DePArT*, we focus on detecting natural phenomena like standing waves, meteotsunamis, or inland flood events as well as anthropogenic artefacts like operating storm surge barriers and sluices in water level time series containing data every minute. Therefore, we train artificial neural networks (ANNs) using water level sequences of phenomena and artefacts as well as redundant data to recognize them in other data sets. We use convolutional neural networks (CNNs) as they already have been successfully conducted in, for example, object detection or speech and language processing (Gu et al., 2018). However, CNNs need to be trained with high numbers of sample sequences. Hence, as a next step the idea is to synthesize rarely observed phenomena and artefacts to gain enough training data. The trained CNNs can then be used to detect unnoticed phenomena and artefacts in past and recent time series. Depending on sequence characteristics and the results of synthesizing, we will possibly be able to detect certain events as they occur and therefore provide pre-checked water level information in real time.

In a later stage of this study, we will implement the developed algorithms in an operational test mode while cooperating closely with the officials to benefit from the mutual feedback. In this way, the study contributes to a future consistent pre-processing and helps to increase the quality of water level data. Moreover, the results are able to reduce uncertainties from the measuring process and improve further calculations based on these data.

* DePArT (Detektion von küstenhydrologischen Phänomenen und Artefakten in minütlichen Tidepegeldaten; engl. Detection of coastal hydrological phenomena and artefacts in minute-by-minute tide gauge data) is a research project, funded by the German Federal Ministry of Education and Research (BMBF) through the project management of Projektträger Jülich PTJ under the grant number 03KIS133.

Gu, Wang, Kuen, Ma, Shahroudy, Shuai, Liu, Wang, Wang, Cai, Chen (2018): Recent advances in convolutional neural networks. In: Pattern Recognition, Vol. 77, Pages 354–377.

How to cite: Soltau, F., Niehüser, S., and Jensen, J.: Using neural networks to detect coastal hydrodynamic phenomena in high-resolution tide gauge data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8399,, 2022.

EGU22-8899 | Presentations | NP4.1

Body wave extraction by using sparsity-promoting time-frequency filtering 

Bahare Imanibadrbani, Hamzeh Mohammadigheymasi, Ahmad Sadidkhouy, Rui Fernandes, Ali Gholami, and Martin Schimmel

Different phases of seismic waves generated by earthquakes carry considerable information about the subsurface structures as they propagate within the earth. Depending on the scope and objective of an investigation, various types of seismic phases are studied. Studying surface waves image shallow and large-scale subsurface features, while body waves provide high-resolution images at higher depths, which is otherwise impossible to be resolved by surface waves. The most challenging aspect of studying body waves is extracting low-amplitude P and S phases predominantly masked by high amplitude and low attenuation surface waves overlapping in time and frequency. Although body waves generally contain higher frequencies than surface waves, the overlapping frequency spectrum of body and surface waves limits the application of elementary signal processing methods such as conventional filtering. Advanced signal processing tools are required to work around this problem. Recently the Sparsity-Promoting Time-Frequency Filtering (SP-TFF) method was developed as a signal processing tool for discriminating between different phases of seismic waves based on their high-resolution polarization information in the Time-Frequency (TF)-domain (Mohammadigheymasi et al., 2022). The SP-TFF extracts different phases of seismic waves by incorporating this information and utilizing a combination of amplitude, directivity, and rectilinearity filters. This study implements SP-TFF by properly defining a filter combination set for specific extraction of body waves masked by high-amplitude surface waves. Synthetic and real data examinations for the source mechanism of the  Mw=7.5 earthquake that occurred in November 2021 in Northern Peru and recorded by 58 stations of the United States National Seismic Network (USNSN) is conducted. The results show the remarkable performance of SP-TFF extracting P and SV phases on the vertical and radial components and SH phase on the transverse component masked by high amplitude Rayleigh and Love waves, respectively. A range of S/N levels is tested, indicating the algorithm’s robustness at different noise levels. This research contributes to the FCT-funded SHAZAM (Ref. PTDC/CTA-GEO/31475/2017) and IDL (Ref. FCT/UIDB/50019/2020) projects. It also uses computational resources provided by C4G (Collaboratory for Geosciences) (Ref. PINFRA/22151/2016).

Mohammadigheymasi, H., P. Crocker, M. Fathi, E. Almeida, G. Silveira, A. Gholami, and M. Schimmel, 2022, Sparsity-promoting approach to polarization analysis of seismic signals in the time-frequency domain: IEEE Transactions on Geoscience and Remote Sensing, 1–1.

How to cite: Imanibadrbani, B., Mohammadigheymasi, H., Sadidkhouy, A., Fernandes, R., Gholami, A., and Schimmel, M.: Body wave extraction by using sparsity-promoting time-frequency filtering, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8899,, 2022.

EGU22-9626 | Presentations | NP4.1

A Recurrence Flow based Approach to Attractor Reconstruction 

Tobias Braun, K. Hauke Kraemer, and Norbert Marwan

In the study of nonlinear observational time series, reconstructing the system’s state space represents the basis for many widely-used analyses. From the perspective of dynamical system’s theory, Taken’s theorem states that under benign conditions, the reconstructed state space preserves the most fundamental properties of the real, unknown system’s attractor. Through many applications, time delay embedding (TDE) has established itself as the most popular approach for state space reconstruction1. However, standard TDE cannot account for multiscale properties of the system and many of the more sophisticated approaches either require heuristic choice for a high number of parameters, fail when the signals are corrupted by noise or obstruct analysis due to their very high complexity.

We present a novel semi-automated, recurrence based method for the problem of attractor reconstruction. The proposed method is based on recurrence plots (RPs), a computationally simple yet effective 2D-representation of a univariate time series. In a recent study, the quantification of RPs has been extended by transferring the well-known box-counting algorithm to recurrence analysis2. We build on this novel formalism by introducing another box-counting measure that was originally put forward by B. Mandelbrot, namely succolarity3. Succolarity quantifies how well a fluid can permeate a binary texture4. We employ this measure by flooding a RP with a (fictional) fluid along its diagonals and computing succolarity as a measure of diagonal flow through the RP. Since a non-optimal choice of embedding parameters impedes the formation of diagonal lines in the RP and generally results in spurious patterns that block the fluid, the attractor reconstruction problem can be formulated as a maximization of diagonal recurrence flow.

The proposed state space reconstruction algorithm allows for non-uniform embedding delays to account for multiscale dynamics. It is conceptually and computationally simple and (nearly) parameter-free. Even in presence of moderate to high noise intensity, reliable results are obtained. We compare the method’s performance to existing techniques and showcase its effectiveness in applications to paradigmatic examples and nonlinear geoscientific time series.



1 Packard, N. H., Crutchfield, J. P., Farmer, J. D., & Shaw, R. S. (1980). Geometry from a time series. Physical review letters, 45(9), 712.

2 Braun, T., Unni, V. R., Sujith, R. I., Kurths, J., & Marwan, N. (2021). Detection of dynamical regime transitions with lacunarity as a multiscale recurrence quantification measure. Nonlinear Dynamics, 1-19.

3 Mandelbrot, B. B. (1982). The fractal geometry of nature (Vol. 1). New York: WH freeman.

4 de Melo, R. H., & Conci, A. (2013). How succolarity could be used as another fractal measure in image analysis. Telecommunication Systems, 52(3), 1643-1655.

How to cite: Braun, T., Kraemer, K. H., and Marwan, N.: A Recurrence Flow based Approach to Attractor Reconstruction, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9626,, 2022.

EGU22-11064 | Presentations | NP4.1

The Objective Deformation Component of a Velocity Field 

Bálint Kaszás, Tiemo Pedergnana, and George Haller

According to a fundamental axiom of continuum mechanics, material response should be objective, i.e., indifferent to the observer. In the context of geophysical fluid dynamics, fluid-transporting vortices must satisfy this axiom and hence different observers should come to the same conclusion about the location and size of these vortices. As a consequence, only objectively defined extraction methods can provide reliable results for material vortices.

As velocity fields are inherently non-objective, they render most Eulerian flow-feature detection non-objective. To resolve this issue,  we discuss a general decomposition of a velocity field into an objective deformation component and a rigid-body component. We obtain this decomposition as a solution of a physically motivated extremum problem for the closest rigid-body velocity of a general velocity field.

This extremum problem turns out to have a unique,  physically interpretable,  closed-form solution. Subtracting this solution from the velocity field then gives an objective deformation velocity field that is also physically observable. As a consequence, all common Eulerian feature detection schemes, as well as the momentum, energy, vorticity, enstrophy, and helicity of the flow, become objective when computed from the deformation velocity component. We illustrate the use of this deformation velocity field on several velocity data sets.

How to cite: Kaszás, B., Pedergnana, T., and Haller, G.: The Objective Deformation Component of a Velocity Field, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11064,, 2022.

EGU22-11118 | Presentations | NP4.1

Explainable community detection of extreme rainfall events using the tangles algorithmic framework 

Merle Kammer, Felix Strnad, and Bedartha Goswami

Climate networks have helped to uncover complex structures in climatic observables from large time series data sets. For instance, climate networks were used to reduce rainfall data to relevant patterns that can be linked to geophysical processes. However, the identification of regions that show similar behavior with respect to the timing and spatial distribution of extreme rainfall events (EREs) remains challenging. 
To address this, we apply a recently developed algorithmic framework based on tangles [1] to discover community structures in the spatial distribution of EREs and to obtain inherently interpretable communities as an output. First, we construct a climate network using time-delayed event synchronization and create a collection of cuts (bipartitions) from the EREs data. By using these cuts, the tangles algorithmic framework allows us to both exploit the climate network structure and incorporate prior knowledge from the data. Applying tangles enables us to create a hierarchical tree representation of communities including the likelihood that spatial locations belong to a community. Each tree layer can be associated to an underlying cut, thus making the division of different communities transparent. 
Applied to global precipitation data, we show that tangles is a promising tool to quantify community structures and to reveal underlying geophysical processes leading to these structures.


[1] S. Klepper, C. Elbracht, D. Fioravanti,  J. Kneip, L. Rendsburg, M. Teegen, and U. von Luxburg. Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees. CoRR, abs/2006.14444v2, 2021. URL

How to cite: Kammer, M., Strnad, F., and Goswami, B.: Explainable community detection of extreme rainfall events using the tangles algorithmic framework, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11118,, 2022.

EGU22-11667 | Presentations | NP4.1

Spurious Behaviour in Networks from Spatio-temporal Data 

Moritz Haas, Bedartha Goswami, and Ulrike von Luxburg

Network-based analyses of dynamical systems have become increasingly popular in climate science. Instead of focussing on the chaotic systems aspect, we come from a statistical perspective and highlight the often ignored fact that the calculated correlation values are only empirical estimates. We find that already the uncertainty stemming from the estimation procedure has major impact on network characteristics. Using isotropic random fields on the sphere, we observe spurious behaviour in commonly constructed networks from finite samples. When the data has locally coherent correlation structure, even spurious link-bundle teleconnections have to be expected. We reevaluate the outcome and robustness of existing studies based on their design choices and null hypotheses.

How to cite: Haas, M., Goswami, B., and von Luxburg, U.: Spurious Behaviour in Networks from Spatio-temporal Data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11667,, 2022.

EGU22-12351 | Presentations | NP4.1

VAE4OBS: Denoising ocean bottom seismograms using variational autoencoders 

Maria Tsekhmistrenko, Ana Ferreira, Kasra Hosseini, and Thomas Kitching

Data from ocean-bottom seismometers (OBS) are inherently more challenging than their land counterpart because of their noisy environment. Primary and secondary microseismic noises corrupt the recorded time series. Additionally, anthropogenic (e.g., ships) and animal noise (e.g., Whales) contribute to a complex noise that can make it challenging to use traditional filtering methods (e.g., broadband or Gabor filters) to clean and extract information from these seismograms. 

OBS deployments are laborious, expensive, and time-consuming. The data of these deployments are crucial in investigating and covering the "blind spots" where there is a lack of station coverage. It, therefore, becomes vital to remove the noise and retrieve earthquake signals recorded on these seismograms.

We propose analysing and processing such unique and challenging data with Machine Learning (ML), particularly Deep Learning (DL) techniques, where conventional methods fail. We present a variational autoencoder (VAE) architecture to denoise seismic waveforms with the aim to extract more information than previously possible. We argue that, compared to other fields, seismology is well-posed to use ML and DL techniques thanks to massive datasets recorded by seismograms. 

In the first step, we use synthetic seismograms (generated with Instaseis) and white noise to train a deep neural network. We vary the signal-to-noise ratio during training. Such synthetic datasets have two advantages. First, we know the signal and noise (as we have injected the noise ourselves). Second, we can generate large training and validation datasets, one of the prerequisites for high-quality DL models.

Next, we increased the complexity of input data by adding real noise sampled from land and OBS to the synthetic seismograms. Finally, we apply the trained model to real OBS data recorded during the RHUM-RUM experiment.

We present the workflow, the neural network architecture, our training strategy, and the usefulness of our trained models compared to traditional methods.

How to cite: Tsekhmistrenko, M., Ferreira, A., Hosseini, K., and Kitching, T.: VAE4OBS: Denoising ocean bottom seismograms using variational autoencoders, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12351,, 2022.

EGU22-13053 | Presentations | NP4.1

Causal Diagnostics for Observations - Experiments with the L63 system 

Nachiketa Chakraborty and Javier Amezcua

Study of cause and effect relationships – causality - is central to identifying mechanisms that cause the phenomena we observe. And in non-linear, dynamical systems, we wish to understand these mechanisms unfolding over time. In areas within physical sciences like geosciences, astrophysics, etc. there are numerous competing causes that drive the system in complicated ways that are hard to disentangle. Hence, it is important to demonstrate how causal attribution works with relatively simpler systems where we have a physical intuition. Furthermore, in earth and atmospheric sciences or meteorology, we have a plethora of observations that are used in both understanding the underlying science beneath the phenomena as well as forecasting. However in order to do this, optimally combining the models (theoretical/numerical) with the observations through data assimilation is a challenging, computationally intensive task. Therefore, understanding the impact of observations and the required cadence is very useful. Here, we present experiments in causal inference and attribution with the Lorenz 63 system – a system studied for a long time. We first test the causal relations between the variables characterising the model. And then we simulate observations using perturbed versions of the model to test the impact of the cadence of observations of each combination of the 3 variables.

How to cite: Chakraborty, N. and Amezcua, J.: Causal Diagnostics for Observations - Experiments with the L63 system, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13053,, 2022.

An accurate understanding of dynamical similarities and dissimilarities in geomagnetic variability between quiet and disturbed periods has the potential to vastly improve Space Weather diagnosis. During the last years, several approaches rooted in dynamical system theory have demonstrated their great potentials for characterizing the instantaneous level of complexity in geomagnetic activity and solar wind variations, and for revealing indications of intermittent large-scale coupling and generalized synchronization phenomena in the Earth’s electromagnetic environment. In this work, we focus on two complementary approaches based on the concept of recurrences in phase space, both of which quantify subtle geometric properties of the phase space trajectory instead of taking an explicit temporal variability perspective. We first quantify the local (instantaneous) and global fractal dimensions and associated local stability properties of a suite of low (SYM-H, ASY-H) and high latitude (AE, AL, AU) geomagnetic indices and discuss similarities and dissimilarities of the obtained patterns for one year of observations during a solar activity maximum. Subsequently, we proceed with studying bivariate extensions of both approaches, and demonstrate their capability of tracing different levels of interdependency between low and high latitude geomagnetic variability during periods of magnetospheric quiescence and along with perturbations associated with geomagnetic storms and magnetospheric substorms, respectively. Ultimately, we investigate the effect of time scale on the level of dynamical organization of fluctuations by studying iterative reconstructions of the index values based on intrinsic mode functions obtained from univariate and multivariate versions of empirical mode decomposition. Our results open new perspectives on the nonlinear dynamics and (likely intermittent) mutual entanglement of different parts of the geospace electromagnetic environment, including the equatorial and westward auroral electrojets, in dependence of the overall state of the geospace system affected by temporary variations of the solar wind forcing. In addition, they contribute to a better understanding of the potentials and limitations of two contemporary approaches of nonlinear time series analysis in the field of space physics.

How to cite: Donner, R., Alberti, T., and Faranda, D.: Instantaneous fractal dimensions and stability properties of geomagnetic indices based on recurrence networks and extreme value theory, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13342,, 2022.

EGU22-986 | Presentations | HS3.6

Quantifying solute transport numerical dispersion in integrated surface-subsurface hydrological modeling 

Beatrice Gatto, Claudio Paniconi, Paolo Salandin, and Matteo Camporese

Numerical dispersion is a well-known problem that affects solute transport in groundwater simulations and can lead to wrong results, in terms of plume path overestimation and overprediction of contaminant dispersion. Numerical dispersion is generally introduced through stabilization techniques aimed at preventing oscillations, with the side effect of increasing mass spreading. Even though this issue has long been investigated in subsurface hydrology, little is known about its possible impacts on integrated surface–subsurface hydrological models (ISSHMs). In this study, we analyze numerical dispersion in the CATchment HYdrology (CATHY) model. In CATHY, a robust and computationally efficient time-splitting technique is implemented for the solution of the subsurface transport equation, whereby the advective part is solved on elements with an explicit finite volume scheme and the dispersive part is solved on nodes with an implicit finite element scheme. Taken alone, the advection and dispersion solvers provide accurate results. However, when coupled, the continuous transfer of concentration from elements to nodes, and vice versa, gives rise to a particular form of numerical dispersion. We assess the nature and impact of this artificial spreading through two sets of synthetic experiments. In the first set, the subsurface transport of a nonreactive tracer in two soil column test cases is simulated and compared with known analytical solutions. Different input dispersion coefficients and mesh discretizations are tested, in order to quantify the numerical error and define a criterion for its containment. In the second set of experiments, fully coupled surface–subsurface processes are simulated using two idealized hillslopes, one concave and one convex, and we examine how the additional subsurface dispersion affects the representation of pre-event water contribution to the streamflow hydrograph. Overall, we show that the numerical dispersion in CATHY that is caused by the transfer of information between elements and nodes can be kept under control if the grid Péclet number is less than 1. It is also suggested that the test cases used in this study can be useful benchmarks for integrated surface–subsurface hydrological models, for which thus far only flow benchmarks have been proposed.

How to cite: Gatto, B., Paniconi, C., Salandin, P., and Camporese, M.: Quantifying solute transport numerical dispersion in integrated surface-subsurface hydrological modeling, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-986,, 2022.

EGU22-1210 | Presentations | HS3.6

An alternative strategy for combining likelihood values in Bayesian calibration to improve model predictions 

Michelle Viswanathan, Tobias K. D. Weber, and Anneli Guthke

Conveying uncertainty in model predictions is essential, especially when these predictions are used for decision-making. Models are not only expected to achieve the best possible fit to available calibration data but to also capture future observations within realistic uncertainty intervals. Model calibration using Bayesian inference facilitates the tuning of model parameters based on existing observations, while accounting for uncertainties. The model is tested against observed data through the likelihood function which defines the probability of the data being generated by the given model and its parameters. Inference of most plausible parameter values is influenced by the method used to combine likelihood values from different observation data sets. In the classical method of combining likelihood values, referred to here as the AND calibration strategy, it is inherently assumed that the given model is true (error-free), and that observations in different data sets are similarly informative for the inference problem. However, practically every model applied to real-world case studies suffers from model-structural errors that are typically dynamic, i.e., they vary over time. A requirement for the imperfect model to fit all data sets simultaneously will inevitably lead to an underestimation of uncertainty due to a collapse of the resulting posterior parameter distributions. Additionally, biased 'compromise solutions' to the parameter estimation problem result in large prediction errors that impair subsequent conclusions. 
We present an alternative AND/OR calibration strategy which provides a formal framework to relax posterior predictive intervals and minimize posterior collapse by incorporating knowledge about similarities and differences between data sets. As a case study, we applied this approach to calibrate a plant phenology model (SPASS) to observations of the silage maize crop grown at five sites in southwestern Germany between 2010 and 2016. We compared model predictions of phenology on using the classical AND calibration strategy with those from two scenarios (OR and ANDOR) in the AND/OR strategy of combining likelihoods from the different data sets. The OR scenario represents an extreme contrast to the AND strategy as all data sets are assumed to be distinct, and the model is allowed to find individual good fits to each period adjusting to the individual type and strength of model error. The ANDOR scenario acts as an intermediate solution between the two extremes by accounting for known similarities and differences between data sets, and hence grouping them according to anticipated type and strength of model error. 
We found that the OR scenario led to lower precision but higher accuracy of prediction results as compared to the classical AND calibration. The ANDOR scenario led to higher accuracy as compared to the AND strategy and higher precision as compared to the OR scenario. Our proposed approach has the potential to improve the prediction capability of dynamic models in general, by considering the effect of model error when calibrating to different data sets.

How to cite: Viswanathan, M., Weber, T. K. D., and Guthke, A.: An alternative strategy for combining likelihood values in Bayesian calibration to improve model predictions, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1210,, 2022.

EGU22-1459 | Presentations | HS3.6

Modelling decisions: a quantification of their influence on model results 

Janneke Remmers, Ryan Teuling, and Lieke Melsen

Scientific hydrological modellers make multiple decisions during the modelling process, e.g. related to the calibration period and temporal resolution. These decisions affect the model results. Modelling decisions can refer to several steps in the modelling process. In this study, modelling decisions refer to the decisions made during the whole modelling process, beyond the definition of the model structure. This study is based on an analysis of interviews with scientific hydrological modellers, thus taking actual practices into account. Six modelling decisions were identified from the interviews, which are mainly motivated by personal and team experience (calibration method, calibration period, parameters to calibrate, pre-processing of input data, spin-up period, and temporal resolution). Different options for these six decisions, as encountered in the interviews, were implemented and evaluated in a controlled modelling environment, in our case the modular modelling framework Raven, to quantify their impact on model output. The variation in the results is analysed using three hydrological signatures to determine which decisions affect the results and how they affect the results. Each model output is a hypothesis of the reality; it is an interpretation of the real system underpinned by scientific reasoning and/or expert knowledge. Currently, there is a lack of knowledge and understanding about which modelling decisions are taken and why they are taken. Consequently, the influence of modelling decisions is unknown. Quantifying this influence, which was done in this study, can raise awareness among scientists. This study pinpoints what aspects are important to consider in studying modelling decisions, and can be an incentive to clarify and improve modelling procedures.

How to cite: Remmers, J., Teuling, R., and Melsen, L.: Modelling decisions: a quantification of their influence on model results, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1459,, 2022.

EGU22-1639 | Presentations | HS3.6

Rigorous Exploration of Complex Environmental Models to Advance Scientific Understanding 

Robert Reinecke, Francesca Pianosi, and Thorsten Wagener

Environmental models are central for advancing science by increasingly serving as a digital twin of the earth and its components. They allow us to conduct experiments to test hypotheses and understand dominant processes that are infeasible to do in the real world. To foster our knowledge, we build increasingly complex models hoping that they become more complete and realistic images of the real world. However, we believe that our scientific progress is slowed down as methods for the rigorous exploration of these models, in the face of unavoidable data- and epistemic-uncertainties, do not evolve in a similar manner.

Based on an extensive literature review, we show that even though methods for such rigorous exploration of model responses, e.g., global sensitivity analysis methods, are well established, there is an upper boundary to which level of model complexity they are applied today. Still, we claim that the potential for their utilization in a wider context is significant.

We argue here that a key issue to consider in this context is the framing of the sensitivity analysis problem. We show, using published examples, how problem framing defines the outcome of a sensitivity analysis in the context of scientific advancement. Without appropriate framing, sensitivity analysis of complex models reduces to a diagnostic analysis of the model, with only limited transferability of the conclusions to the real-world system.

How to cite: Reinecke, R., Pianosi, F., and Wagener, T.: Rigorous Exploration of Complex Environmental Models to Advance Scientific Understanding, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1639,, 2022.

We propose a method to analyse, classify and compare dynamical systems of arbitrary dimension by the two key features uncertainty and complexity. It starts by subdividing the system’s time-trajectory into a number of time slices. For all values in a time slice, the Shannon information entropy is calculated, measuring within-slice variability. System uncertainty is then expressed by the mean entropy of all time slices. We define system complexity as “uncertainty about uncertainty”, and express it by the entropy of the entropies of all time slices. Calculating and plotting uncertainty u and complexity c for many different numbers of time slices yields the c-u-curve. Systems can be analysed, compared and classified by the c-u-curve in terms of i) its overall shape, ii) mean and maximum uncertainty, iii) mean and maximum complexity, and iv) its characteristic time scale expressed by the width of the time slice for which maximum complexity occurs. We demonstrate the method at the example of both synthetic and real-world time series (constant, random noise, Lorenz attractor, precipitation and streamflow) and show that conclusions drawn from the c-u-curve are in accordance with expectations. The method is based on unit-free probabilities and therefore permits application to and comparison of arbitrary data. It naturally expands from single- to multivariate systems, and from deterministic to probabilistic value representations, allowing e.g. application to ensemble model predictions. 

How to cite: Ehret, U. and Dey, P.: c-u-curve: A method to analyze, classify and compare dynamical systems by uncertainty and complexity, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1742,, 2022.

EGU22-1870 | Presentations | HS3.6

Inference of (geostatistical) hyperparameters with the correlated pseudo-marginal method 

Lea Friedli, Niklas Linde, David Ginsbourger, Alejandro Fernandez Visentini, and Arnaud Doucet

We consider non-linear Bayesian inversion problems to infer the (geostatistical) hyperparameters of a random field describing (hydro)geological or geophysical properties by inversion of hydrogeological or geophysical data. This problem is of particular importance in the non-ergodic setting as no analytical upscaling relationships exist linking the data (resulting from a specific field realization) to the hyperparameters specifying the spatial distribution of the underlying random field (e.g., mean, standard deviation, and integral scales). Jointly inferring the hyperparameters and the "true" realization of the field (typically involving many thousands of unknowns) brings important computational challenges, such that in practice, simplifying model assumptions (such as homogeneity or ergodicity) are made. To prevent the errors resulting from such simplified assumptions while circumventing the burden of high-dimensional full inversions, we use a pseudo-marginal Metropolis-Hastings algorithm that treats the random field as a latent variable. In this random effect model, the intractable likelihood of observing the hyperparameters given the data is estimated by Monte Carlo averaging over realizations of the random field. To increase the efficiency of the method, low-variance approximations of the likelihood ratio are ensured by correlating the samples used in the proposed and current steps of the Markov chain and by using importance sampling. We assess the performance of this correlated pseudo-marginal method to the problem of inferring the hyperparameters of fracture aperture fields using borehole ground-penetrating radar (GPR) reflection data. We demonstrate that the correlated pseudo-marginal method bypasses the computational challenges of a very high-dimensional target space while avoiding the strong bias and too low uncertainty ranges obtained when employing simplified model assumptions. These advantages also apply when using the posterior of the hyperparameters describing the aperture field to predict its effective hydraulic transmissivity.

How to cite: Friedli, L., Linde, N., Ginsbourger, D., Fernandez Visentini, A., and Doucet, A.: Inference of (geostatistical) hyperparameters with the correlated pseudo-marginal method, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1870,, 2022.

This study proposes a new approach for quantitively assessing the importance of precipitation features in space and time to predict streamflow discharge (and, hence, sensitivity). For this, we combine well-performing deep-learning (DL) models with interpretability tools.

The DL models are composed of convolutional neural networks (CNNs) and long-short term memory (LSTM) networks. Their input is precipitation data distributed over the watershed and taken back in time (other inputs, meteorological and watershed properties, can also be included). Its output is streamflow discharge at a present or future time. Interpretability tools allow learning about the modeled system. We used the Integrated Gradients method that provides a level of importance (IG value) for each space-time precipitation feature for a given streamflow prediction. We applied the models and interpretability tools to several watersheds in the US and India.

To understand the importance of precipitation features for flood generation, we compared spatial and temporal patterns of IG for high flows vs. low and medium flows. Our results so far indicate some similar patterns for the two categories of flows, but others are distinctly different. For example, common IG mods exist at short times before the discharge, but mods are substantially different when considered further back in time. Similarly, some spatial cores of high IG appear in both flow categories, but other watershed cores are featured only for high flows. These IG time and space pattern differences are presumably associated with slow and fast flow paths and threshold-runoff mechanisms.

There are several advantages to the proposed approach: 1) recent studies have shown DL models to outperform standard process-based hydrological models, 2) given data availability and quality, DL models are much easier to train and validate, compared to process-based hydrological models, and therefore many watersheds can be included in the analysis, 3) DL models do not explicitly represent hydrological processes, and thus sensitivities derived in this approach are assured to represent patterns arise from the data. The main disadvantage of the proposed approach is its limitation to gauged watersheds only; however, large data sets are publicly available to exploit sensitivities of gauged streamflow.

It should be stressed out that learning about hydrological sensitivities with DL models is proposed here as a complementary approach to analyzing process-based hydrological models. Even though DL is considered black-box models, together with interpretability tools, they can highlight hard or impossible sensitivities to resolve with standard models.

How to cite: Morin, E., Rojas, R., and Wiesel, A.: Quantifying space-time patterns of precipitation importance for flood generation via interpretability of deep-learning models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1907,, 2022.

EGU22-2220 | Presentations | HS3.6

Inversion of Hydraulic Tomography Data from the Grimsel Test Site with a Discrete Fracture Network Model 

Lisa Maria Ringel, Mohammadreza Jalali, and Peter Bayer

This study aims at the stochastic characterization of fractured rocks with a low-permeability matrix based on transient data from hydraulic tomography experiments. In such rocks, fractures function as main flowpaths. Therefore, adequate insight about distribution and properties of fractures is essential for many applications such as groundwater remediation, constructing nuclear waste repositories or developing enhanced geothermal systems. At the Grimsel test site in Switzerland, multiple hydraulic tests have been conducted to investigate the hydraulic properties and structure of the fracture network between two shear zones. We present results from combined stochastic inversion of these tests to infer the fracture network of the studied crystalline rock formation.

Data from geological mapping at Grimsel and the hydraulic tomography experiments that were undertaken as part of in-situ stimulation and circulation experiments provide the prior knowledge for the model inversion. This information is used for the setting-up of a site-specific conceptual model, to define the boundary and initial conditions of the groundwater flow model, and for the configuration of the inversion problem. The pressure signals we apply for the inversion stem from cross-borehole constant rate injection tests recorded at different depths, whereby the different intervals are isolated by packer systems.

In the forward model, the fractures are represented explicitly as three-dimensional (3D) discrete fracture network (DFN). The geometric and hydraulic properties of the DFN are described by the Bayesian equation. The properties are inferred by sampling iteratively from the posterior density function according to the reversible jump Markov chain Monte Carlo sampling strategy. The goal of this inversion is providing DFN realizations that minimize the error between the simulated and observed pressure signals and that meet the prior information. During the course of the inversion, the number of fractures is iteratively adjusted by adding or deleting a fracture. Furthermore, the parameters of the DFN are adapted by moving a fracture and by changing the fracture length or hydraulic properties. Thereby, the algorithm switches between updates that change the number of parameters and updates that keep the number of parameters but adjust their value. The inversion results reveal the main structural and hydraulic characteristics of the DFN, the preferential flowpaths, and the uncertainty of the estimated model parameters.

How to cite: Ringel, L. M., Jalali, M., and Bayer, P.: Inversion of Hydraulic Tomography Data from the Grimsel Test Site with a Discrete Fracture Network Model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2220,, 2022.

EGU22-2388 | Presentations | HS3.6

Estimation of simulation parameters for steady and transient 3D flow modeling at watershed scale 

Gillien Latour, Pierre Horgue, François Renard, Romain Guibert, and Gérald Debenest
Unsaturated water flows at watershed scale or Darcy-scale are generally described by the Richardson-Richards equation. This equation is highly non-linear and simulation domains are limited by computational costs. The porousMultiphaseFoam toolbox is a Finite Volume tool capable of modeling multiphase flows in porous media, including the solving of the Richardson-Richards equation. As it has been developed using the OpenFOAM environment, the software is natively fully parallelized and can be used on super computers. By using experimental data from real site with geographical informations and piezometrics values, an iterative algorithm is set up to solve an inverse problem in order to evaluate an adequate permeability field. This procedure is initially implemented using simplified aquifer model with a 2D saturated modeling approach. A similar procedure using a full 3D model of the actual site is performed (handling both saturated and unsaturated area). The results are compared between the two approaches (2D and 3D) for steady simulations and new post-processing tools are also introduced to spatialize the error between the two models and define the areas for which the behaviour of the models is different. In a second part, an optimization of the Van Genuchten parameters is performed to reproduce transient experimental data. The 3D numerical results at the watershed scale are also compared to the reference simulations using a 1D unsaturated + 2D satured modeling approach.

How to cite: Latour, G., Horgue, P., Renard, F., Guibert, R., and Debenest, G.: Estimation of simulation parameters for steady and transient 3D flow modeling at watershed scale, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2388,, 2022.

EGU22-2782 | Presentations | HS3.6

Global Sensitivity Analysis of an integrated parallel hydrologic model: ParFlow-CLM 

Wei Qu, Heye Bogena, Christoph Schüth, Harry Vereecken, and Stephan Schulz

An integrated parallel hydrologic model (ParFlow-CLM) was constructed to predict water and energy transport between subsurface, land surface, and atmosphere for a synthetic study using basic physical properties of the Stettbach headwater catchment, Germany. Based on this model, a global sensitivity analysis was performed using the Latin-Hypercube (LH) sampling strategy followed by the One-factor-At-a-Time (OAT) method to identify the most influential and interactive parameters affecting the main hydrologic processes. In addition, the sensitivity analysis was also carried out for assumptions of different slopes and meteorological conditions to show the transferability of the results to regions with other topographies and climates. Our results show that the simulated energy fluxes, i.e. latent heat flux, sensible heat flux and soil heat flux, are more sensitive to the parameters of wilting point, leaf area index, and stem area index, especially for steep slope and subarctic climate conditions. The simulated water fluxes, i.e. evaporation, transpiration, infiltration, and runoff, are most sensitive to soil porosity, van-Genuchen parameter n, wilting point, and leaf area index. The subsurface water storage and groundwater storage were most sensitive to soil porosity, while the surface water storage is most sensitive to the Manning’s n parameter. For the different slope and climate conditions, the rank order of in input parameter sensitivity was consistent, but the magnitude of parameter sensitivity was very different. The strongest deviation in parameter sensitivity occurred for sensible heat flux under different slope conditions and for transpiration under different climate conditions. This study provides an efficient method of the identification of the most important input parameters of the model and how the variation in the output of a numerical model can be attributed to variations of its input factors. The results help to better understand process representation of the model and reduce the computational cost of running high numbers of simulations. 

How to cite: Qu, W., Bogena, H., Schüth, C., Vereecken, H., and Schulz, S.: Global Sensitivity Analysis of an integrated parallel hydrologic model: ParFlow-CLM, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2782,, 2022.

EGU22-3691 | Presentations | HS3.6

Hydrogeological inference by adaptive sequential Monte Carlo with geostatistical resampling model proposals 

Macarena Amaya, Niklas Linde, and Eric Laloy

For strongly non-linear inverse problems, Markov chain Monte Carlo (MCMC) methods may fail to properly explore the posterior probability density function (PDF). Particle methods are very well suited for parallelization and offer an alternative approach whereby the posterior PDF is approximated using the states and weights of a population of evolving particles. In addition, it provides reliable estimates of the evidence (marginal likelihood) that is needed for Bayesian model selection at essentially no cost. We consider adaptive sequential Monte Carlo (ASMC), which is an extension of annealed importance sampling (AIS). In these methods, importance sampling is performed over a sequence of intermediate distributions, known as power posteriors, linking the prior to the posterior PDF. The main advantages of ASMC with respect to AIS are that it adaptively tunes the tempering between neighboring distributions and it performs resampling of particles when the variance of the particle weights becomes too large. We consider a challenging synthetic groundwater transport inverse problem with a categorical channelized 2D hydraulic conductivity field designed such that the posterior facies distribution includes two distinct modes with equal probability. The model proposals are obtained by iteratively re-simulating a fraction of the current model using conditional multi-point statistics (MPS) simulations. We focus here on the ability of ASMC to explore the posterior PDF and compare it with previously published results obtained with parallel tempering (PT), a state-of-the-art MCMC inversion approach that runs multiple interacting chains targeting different power posteriors. For a similar computational budget involving 24 particles for ASMC and 24 chains for PT, the ASMC implementation outperforms the results obtained by PT: the models fit the data better and the reference likelihood value is contained in the ASMC sampled likelihood range, while this is not the case for PT range. Moreover, we show that ASMC recovers both reference modes, while none of them is recovered by PT. However, with 24 particles there is one of the modes that has a higher weight than the other while the approximation is improved when moving to a larger number of particles. As a future development, we suggest that including fast surrogate modeling (e.g., polynomial chaos expansion) within ASMC for the MCMC steps used to evolve the particles in-between importance sampling steps would strongly reduce the computational cost while still ensuring results of similar quality as the importance sampling steps could still be performed using the regular more costly forward solver.

How to cite: Amaya, M., Linde, N., and Laloy, E.: Hydrogeological inference by adaptive sequential Monte Carlo with geostatistical resampling model proposals, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3691,, 2022.

EGU22-3782 | Presentations | HS3.6

Uncertainty assessment and data-worth evaluation for estimating soil hydraulic parameters and recharge fluxes from lysimeter data 

Marleen Schübl, Christine Stumpp, and Giuseppe Brunetti

Transient measurements from lysimeters are frequently coupled with Richards-based solvers to inversely estimate soil hydraulic parameters (SHPs) and numerically describe vadose zone water fluxes, such as recharge. To reduce model predictive uncertainty, the lysimeter experiment should be designed to maximize the information content of observations. However, in practice, this is generally done by relying on the a priori expertise of the scientist/user, without exploiting the advantages of model-based experimental design. Thus, the main aim of this study is to demonstrate how model-based experimental design can be used to maximize the information content of observations in multiple scenarios encompassing different soil textural compositions and climatic conditions. The hydrological model HYDRUS is coupled with a Nested Sampling estimator to calculate the parameters’ posterior distributions and the Kullback-Leibler divergences. Results indicate that the combination of seepage flow, soil water content, and soil matric potential measurements generally leads to highly informative designs, especially for fine textured soils, while results from coarse soils are generally affected by higher uncertainty. Furthermore, soil matric potential proves to be more informative than soil water content measurements. Additionally, the propagation of parameter uncertainties in a contrasting (dry) climate scenario strongly increased prediction uncertainties for sandy soil, not only in terms of the cumulative amount and magnitude of the peak, but also in the temporal variability of the seepage flow. 

How to cite: Schübl, M., Stumpp, C., and Brunetti, G.: Uncertainty assessment and data-worth evaluation for estimating soil hydraulic parameters and recharge fluxes from lysimeter data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3782,, 2022.

EGU22-6882 | Presentations | HS3.6 | Highlight

A review of conceptual model uncertainty in groundwater research 

Okke Batelaan, Trine Enemark, Luk Peeters, and Dirk Mallants

For more than a century, the strong advice in geology has been to rely on multiple working hypotheses. However, in groundwater research, as supported by modelling, often a stepwise approach with respect to complexity is promoted and preferred by many. Defining a hypothesis, let alone multiple hypotheses, and testing these via groundwater models is rarely applied. The so-called ‘conceptual model’ is generally considered the starting point of our beloved modelling method. A conceptual model summarises our current knowledge about a groundwater system, describing the hydrogeology and the dominating processes. Conceptual model development should involve formulating hypotheses and leading to choices in the modelling that steer the model predictions. As many conceptual models can explain the available data, multiple hypotheses allow assessing the conceptual or structural uncertainty.

This presentation aims to review some of the key ideas of 125 years of research on (not) handling conceptual hydrogeological uncertainty, identify current approaches, unify scattered insights, and develop a systematic methodology of hydrogeological conceptual model development and testing. We advocate for a systematic model development approach based on mutually exclusive, collectively exhaustive range of hypotheses, although this is not fully achievable. We provide examples of this approach and the consequential model testing. It is argued that following this scientific recipe of refuting alternative models; we will increase the learnings of our research, reduce the risk of conceptual surprises and improve the robustness of the groundwater assessments. We conclude that acknowledging and explicitly accounting for conceptual uncertainty goes a long way in producing more reproducible groundwater research. Hypothesis testing is essential to increase system understanding by analyzing and refuting alternative conceptual models. It also provides more confidence in groundwater model predictions leading to improved groundwater management, which is more important than ever.

How to cite: Batelaan, O., Enemark, T., Peeters, L., and Mallants, D.: A review of conceptual model uncertainty in groundwater research, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6882,, 2022.

EGU22-7774 | Presentations | HS3.6

Efficient inversion with complex geostatistical priors using normalizing flows and variational inference 

Shiran Levy, Eric Laloy, and Niklas Linde

We propose an approach for solving geophysical inverse problems which significantly reduces computational costs as compared to Markov chain Monte Carlo (MCMC) methods while providing enhanced uncertainty quantification as compared to efficient gradient-based deterministic methods. The proposed approach relies on variational inference (VI), which seeks to approximate the unnormalized posterior distribution parametrically for a given family of distributions by solving an optimization problem. Although prone to bias if the family of distributions is too limited, VI provides a computationally-efficient approach that scales well to high-dimensional problems. To enhance the expressiveness of the parameterized posterior in the context of geophysical inverse problems, we use a combination of VI and inverse autoregressive flows (IAF), a type of normalizing flows that has been shown to be efficient for machine learning tasks. The IAF consists of invertible neural transport maps transforming an initial density of random variables into a target density, in which the mapping of each instance is conditioned on previous ones. In the combined VI-IAF routine, the approximate distribution is parameterized by the IAF, therefore, the potential expressiveness of the unnormalized posterior is determined by the architecture of the network. The parameters of the IAF are learned by minimizing the Kullback-Leibler divergence between the approximated posterior, which is obtained from samples drawn from a standard normal distribution that are pushed forward through the IAF, and the target posterior distribution. We test this approach on problems in which complex geostatistical priors are described by latent variables within a deep generative model (DGM) of the adversarial type. Previous results have concluded that inversion based on gradient-based optimization techniques perform poorly in this setting because of the high nonlinearity of the generator. Preliminary results involving linear physics suggest that the VI-IAF routine can recover the true model and provides high-quality uncertainty quantification at a low computational cost. As a next step, we will consider cases where the forward model is nonlinear and include comparison against standard MCMC sampling. As most of the inverse problem nonlinearity arises from the DGM generator, we do not expect significant differences in the quality of the approximations with respect to the linear physics case.

How to cite: Levy, S., Laloy, E., and Linde, N.: Efficient inversion with complex geostatistical priors using normalizing flows and variational inference, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7774,, 2022.

EGU22-8583 | Presentations | HS3.6

Quantifying transport ability of hindcast and forecast ocean models 

Makrina Agaoglou, Guillermo García-Sánchez, Amaia Marcano Larrinaga, Gabriel Mouttapa, and Ana M. Mancho

In the last years, there has been much interest in uncertainty quantification involving trajectories in ocean data sets. As more and more oceanic data become available the assessing quality of ocean models to address transport problems like oil spills, chemical or plastic transportation becomes of vital importance. In our work we are using two types of ocean models: the hindcast and the forecast in a specific domain in the North Atlantic, where drifter trajectory data were available. The hindcast approach requires running ocean (or atmospheric) models for a past period the duration of which is usually for several decades. On the other hand forecast approach is to predict future stages. Both ocean products are provided by CMEMS. Hindcast data includes extra observational data that was time-delayed and therefore to the original forecast run. This means that in principle, hindcast data are more accurate than archived forecast data. In this work, we focus on the comparison of the transport capacity between hindcast and forecast products in the Gulf stream and the Atlantic Ocean, based on the dynamical structures of the dynamical systems describing the underlying transport problem, in the spirit of [1]. In this work, we go a step forwards, by quantifying the transport performance of each model against observed drifters using tools developed in [2].


MA acknowledges support from the grant CEX2019-000904-S and IJC2019-040168-I funded by: MCIN/AEI/ 10.13039/501100011033, AMM and GGS acknowledge support from CSIC PIE grant Ref. 202250E001.


[1] C. Mendoza, A. M. Mancho, and S. Wiggins, Lagrangian descriptors and the assessment of the predictive capacity of oceanic data sets, Nonlin. Processes Geophys., 21, 677–689, 2014, doi:10.5194/npg-21-677-2014

[2] G.García-Sánchez, A.M.Mancho, and S.Wiggins, A bridge between invariant dynamical structures and uncertainty quantification, Commun Nonlinear Sci Numer Simulat 104, 106016, 2022, doi:10.1016/j.cnsns.2021.106016 

How to cite: Agaoglou, M., García-Sánchez, G., Marcano Larrinaga, A., Mouttapa, G., and Mancho, A. M.: Quantifying transport ability of hindcast and forecast ocean models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8583,, 2022.

Conceptual models are indispensable tools for hydrology. In order to use them for making probabilistic predictions, they need to be equipped with an adequate error model, which, for ease of inference, is traditionally formulated as an additive error on the output (discharge). However, the main sources of uncertainty in hydrological modelling are typically not to be found on the output, but on the input (rain) and in the model structure. Therefore, more reliable error models and probabilistic predictions can be obtained by incorporating those uncertainties directly where they arise, that is, into the model. This, however, leads us to stochastic models, which render traditional inference algorithms such as the Metropolis algorithm infeasible due to their expensive likelihood functions. However, thanks to recent advancements in algorithms and computing power, full-fledged Bayesian inference with stochastic models is no longer off-limit for hydrological applications. We demonstrate this with a case study from urban hydrology, for which we employ a highly efficient Hamiltonian Monte Carlo inference algorithm with a time-scale separation.

How to cite: Ulzega, S. and Albert, C.: Bayesian parameter inference in hydrological modelling using a Hamiltonian Monte Carlo approach with a stochastic rain model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8729,, 2022.

In this work we introduce hydroMOPSO, a novel multi-objective R package that combines two search mechanisms to maintain diversity of the population and accelerate its convergence towards the Pareto-optimal set: Particle Swarm Optimisation (PSO) and genetic operations. hydroMOPSO is model-independent, which allows to interface any model code with the calibration engine, including models available in R (e.g., TUWmodel, airGR, topmodel), but also any other complex models that can be run from the system console (e.g. SWAT+, Raven, WEAP). In addition, hydroMOPSO is platform-independent, which allows it to run on GNU/Linux, Mac OSX and Windows systems, among others.

Considering the long execution time of some real-world models, we used three benchmark functions to search for a configuration that allows to reach the Pareto-optimal front with a low number of model evaluations, analysing different combinations of: i) the swarm size in PSO, ii) the maximum number of particles in the external archive, and iii) the maximum number of genetic operations in the external archive. In addition, the previous configuration was then evaluated against other state-of-the-art multi-objective optimisation algorithms (MMOPSO, NSGA-II, NSGA-III). Finally, hydroMOPSO was used to calibrate a GR4J-CemaNeige hydrological model implemented in the Raven modelling framework (, using two goodness-of-fit functions: i) the modified Kling-Gupta efficiency (KGE') and ii) the Nash-Sutcliffe efficiency with inverted flows (iNSE).

Our results showed that the configuration selected for hydroMOPSO makes it very competitive or even superior against MMOPSO, NSGA-II and NSGA- III in terms of the number of function evaluations required to achieve stabilisation in the Pareto front, and also showed some advantages of using a compromise solution instead of a single-objective one for the estimation of hydrological model parameters.

How to cite: Marinao-Rivas, R. and Zambrano-Bigiarini, M.: hydroMOPSO: A versatile Particle Swarm Optimization R package for multi-objective calibration of environmental and hydrological models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9902,, 2022.

EGU22-10431 | Presentations | HS3.6

Consistency and variability of spatial and temporal patterns of parameter dominance on four simulated hydrological variables in mHM in a large basin study 

Björn Guse, Stefan Lüdtke, Oldrich Rakovec, Stephan Thober, Thorsten Wagener, and Luis Samaniego

Model parameters are implemented in hydrological models to represent hydrological processes as accurate as possible under different catchment conditions. In the case of the mesoscale Hydrological Model (mHM), its parameters are estimated via transfer functions and scaling rules using the Multiscale Parameter Regionalization (MPR) approach [1]. Hereby, one consistent parameter set is selected for the entire model domain. To understand the impact of model parameters on simulated variables under different hydrological conditions, the spatio-temporal variability of parameter dominance and its relationship to the corresponding processes needs to be investigated.

In this study, mHM is applied to more than hundred German basins including the headwater areas in neighboring countries. To analyze the relevance of model parameters, a temporally resolved parameter sensitivity analysis using the FAST algorithm [2] is applied to derive dominant model parameters for each day. The temporal scale was further aggregated to monthly and seasonal averaged sensitivities. In analyzing a large number of basins, not only the temporal but also the spatial variability in the parameter relevance could be assessed. Four hydrological variables were used as target variable for the sensitivity analysis, i.e. runoff, actual evapotranspiration, soil moisture and groundwater recharge.

The analysis of the temporal parameter sensitivity shows that the dominant parameters vary in space and time and in using different target variables. Soil material parameters are most dominant on runoff and recharge. A switch in parameter dominance between different seasons was detected for an infiltration and an evapotranspiration parameter that are dominant on soil moisture in winter and summer, respectively. The opposite seasonal dominance pattern of these two parameters was identified on actual evapotranspiration. Further, each parameter shows high sensitivities to either high or low values of one or more hydrological variable(s). The parameter estimation approach leads to spatial consistent patterns of parameter dominances. Spatial differences and similarities in parameter sensitivities could be explained by catchment variability.

The results improve the understanding of how model parameter controls the simulated processes in mHM. This information could be useful for more efficient parameter identification, model calibration and improved MPR transfer functions.



[1] Samaniego et al. (2010, WRR),

[2] Reusser et al. (2011, WRR),

How to cite: Guse, B., Lüdtke, S., Rakovec, O., Thober, S., Wagener, T., and Samaniego, L.: Consistency and variability of spatial and temporal patterns of parameter dominance on four simulated hydrological variables in mHM in a large basin study, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10431,, 2022.

EGU22-10654 | Presentations | HS3.6 | Highlight

Uncertainty assessment with Bluecat: Recognising randomness as a fundamental component of physics 

Alberto Montanari and Demetris Koutsoyiannis

We present a new method for simulating and predicting hydrologic variables and in particular river flows, which is rooted in the probability theory and conceived in order to provide a reliable quantification of its uncertainty for operational applications. In fact, recent practical experience during extreme events has shown that simulation and prediction uncertainty is essential information for decision makers and the public. A reliable and transparent uncertainty assessment has also been shown to be essential to gain public and institutional trust in real science. Our approach, that we term with the acronym "Bluecat", assumes that randomness is a fundamental component of physics and results from a theoretical and numerical development. Bluecat is conceived to make a transparent and intuitive use of uncertain observations which in turn mirror the observed reality. Therefore, Bluecat makes use of a rigorous theory while at the same time proofing the concept that environmental resources should be managed by making the best use of empirical evidence and experience and recognising randomness as an intrinsic property of hydrological systems. We provide an open and user friendly software to apply the method to the simulation and prediction of river flows and test Bluecat's reliability for operational applications.

How to cite: Montanari, A. and Koutsoyiannis, D.: Uncertainty assessment with Bluecat: Recognising randomness as a fundamental component of physics, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10654,, 2022.

EGU22-11794 | Presentations | HS3.6

Effect of regional heterogeneities on inversion stability and estimated hydraulic properties field 

Hervé Jourde, Mohammed Aliouache, Pierre Fischer, Xiaoguang Wang, and Gerard Massonnat

Hydraulic tomography showed great potential on estimating the spatial distribution of heterogeneous aquifer properties in the last decade.  Though this method is highly performant on synthetic studies, the transition from an application to synthetic models to real field applications is often associated to numerical instabilities. Inversion techniques can also suffer from ill-posedness and non-uniqueness of the estimates since several solutions might correctly mimic the observed hydraulic data. In this work, we investigate the origin of the instabilities observed when trying to perform HT using real field drawdown data. We firstly identify the cause of these instabilities. We then use different approaches, where one is proposed, in order to regain inverse model stability, which also allows to estimate different hydraulic property fields at local and regional scales. Results show that ill-posed models can lead into inversion instability while different approaches that limit these instabilities may lead into different estimates. The study also shows that the late time hydraulic responses are strongly linked to the boundary conditions and thus to the regional heterogeneity. Accordingly, the use on these late-time data in inversion might require a larger dimension of the inverted domain, so that it is recommended to position the boundary conditions of the forward model far away from the wells. Also, the use of the proposed technique might provide a performant tool to obtain a satisfying fitting of observation, but also to assess both the site scale heterogeneity and the surrounding variabilities.

How to cite: Jourde, H., Aliouache, M., Fischer, P., Wang, X., and Massonnat, G.: Effect of regional heterogeneities on inversion stability and estimated hydraulic properties field, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11794,, 2022.

EGU22-11844 | Presentations | HS3.6

Benchmarking Automatically Identified Model Structures with a Large Model Ensemble 

Diana Spieler, Kan Lei, and Niels Schütze

Recent studies have introduced methods to simultaneously calibrate model structure choices and parameter values to identify an appropriate (conceptual) model structure for a given catchment. This can be done through mixed-integer optimization to identify the graph structure that links dominant flow processes (Spieler et al., 2020) or, likewise, by continuous optimization of weights when blending multiple flux equations to describe flow processes within a model (Chlumsky et al., 2021). Here, we use the combination of the mixed-integer optimization algorithm DDS and the modular modelling framework RAVEN and refer to it as Automatic Model Structure Identification (AMSI) framework.

This study validates the AMSI framework by comparing the performance of the identified AMSI model structures to two different benchmark ensembles. The first ensemble consists of the best model structures from the brute force calibration of all possible structures included in the AMSI model space (7488+). The second ensemble consists of 35+ MARRMoT structures representing a structurally more divers set of models than currently implemented in the AMSI framework. These structures stem from the MARRMoT Toolbox introduced by Knoben et al. (2019) providing established conceptual model structures based on hydrologic literature.

We analyze if the model structure(s) AMSI identifies are identical to the best performing structures of the brute force calibration and comparable in their performance to the MARRMoT ensemble. We can conclude that model structures identified with the AMSI framework can compete with the structurally more divers MARRMoT ensemble. In fact, we were surprised to see how well we do with a simple two storage structure over the 12 tested MOPEX catchments (Duan et al.,2006). We aim to discuss several emerging questions, such as the selection of a robust model structure, Equifinality in model structures, and the role of structural complexity.


Spieler et al. (2020).

Chlumsky et al. (2021).

Knoben et al. (2019).

Duan et al. (2006).

How to cite: Spieler, D., Lei, K., and Schütze, N.: Benchmarking Automatically Identified Model Structures with a Large Model Ensemble, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11844,, 2022.

Pearson’s correlation is usually used as a criterion for the presence or absence of a relationship between time series, but it is not always indicative for nonlinear systems like climate. Therefore, we implement one of the methods of nonlinear dynamics to detect connections in the Sun-climate system. Here we estimate the causal relationship between Total Solar Irradiance (TSI) and Ocean climate indices over the past few decades using the method of conditional dispersions (Cenys et al., 1991). We use a conceptual ocean-atmosphere model (Jin, 1997) with TSI added as a forcing to calibrate the method. We show that the method provides expected results for connection between TSI and the model temperature. Premixing of Gaussian noise to model data leads to decrease of detectable causality with increase of noise amplitude, and the similar effect occurs in empirical data. Moreover, in the case of the empirical data, we show that the method can be used to independently estimate uncertainties of Ocean climate indices.

How to cite: Skakun, A. and Volobuev, D.: Ocean climate indices and Total Solar Irradiance: causality over the past few decades and revision of indices uncertainties, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12691,, 2022.

EGU22-20 | Presentations | ITS2.6/AS5.1

PRECISIONPOP: a multi-scale monitoring system for poplar plantations integrating field, aerial and satellite remote sensing 

Francesco Chianucci, Francesca Giannetti, Clara Tattoni, Nicola Puletti, Achille Giorcelli, Carlo Bisaglia, Elio Romano, Massimo Brambilla, Piermario Chiarabaglio, Massimo Gennaro, Giovanni d'Amico, Saverio Francini, Walter Mattioli, Domenico Coaloa, Piermaria Corona, and Gherardo Chirici

Poplar (Populus spp.) plantations are globally widespread in the Northern Hemisphere, and provide a wide range of benefits and products, including timber, carbon sequestration and phytoremediation. Because of poplar specific features (fast growth, short rotation) the information needs require frequent updates, which exceed the traditional scope of National Forest Inventories, implying the need for ad-hoc monitoring solutions.

Here we presented a regional-level multi-scale monitoring system developed for poplar plantations, which is based on the integration of different remotely-sensed informations at different spatial scales, developed in Lombardy (Northern Italy) region. The system is based on three levels of information: 1) At plot scale, terrestrial laser scanning (TLS) was used to develop non-destructive tree stem volume allometries in calibration sites; the produced allometries were then used to estimate plot-level stand parameters from field inventory; additional canopy structure attributes were derived using field digital cover photography. 2) At farm level, unmanned aerial vehicles (UAVs) equipped with multispectral sensors were used to upscale results obtained from field data. 3) Finally, both field and unmanned aerial estimates were used to calibrate a regional-scale supervised continuous monitoring system based on multispectral Sentinel-2 imagery, which was implemented and updated in a Google Earth Engine platform.

The combined use of multi-scale information allowed an effective management and monitoring of poplar plantations. From a top-down perspective, the continuous satellite monitoring system allowed the detection of early warning poplar stress, which are suitable for variable rate irrigation and fertilizing scheduling. From a bottom-up perspective, the spatially explicit nature of TLS measurements allows better integration with remotely sensed data, enabling a multiscale assessment of poplar plantation structure with different levels of detail, enhancing conventional tree inventories, and supporting effective management strategies. Finally, use of UAV is key in poplar plantations as their spatial resolution is suited for calibrating metrics from coarser remotely-sensed products, reducing or avoiding the need of ground measurements, with a significant reduction of time and costs.

How to cite: Chianucci, F., Giannetti, F., Tattoni, C., Puletti, N., Giorcelli, A., Bisaglia, C., Romano, E., Brambilla, M., Chiarabaglio, P., Gennaro, M., d'Amico, G., Francini, S., Mattioli, W., Coaloa, D., Corona, P., and Chirici, G.: PRECISIONPOP: a multi-scale monitoring system for poplar plantations integrating field, aerial and satellite remote sensing, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-20,, 2022.

EGU22-124 | Presentations | ITS2.6/AS5.1

Unsupervised machine learning driven Prospectivity analysis of REEs in NE India 

Malcolm Aranha and Alok Porwal

Traditional mineral prospectivity modelling for mineral exploration and targeting relies heavily on manual data filtering and processing to extract desirable geologic features based on expert knowledge. It involves the integration of geological predictor maps that are manually derived by time-consuming and labour-intensive pre-processing of primary geoscientific data to serve as spatial proxies of mineralisation processes. Moreover, the selection of these spatial proxies is guided by conceptual genetic modelling of the targeted deposit type, which may be biased by the subjective preference of an expert geologist. This study applies Self-Organising Maps (SOM), a neural network-based unsupervised machine learning clustering algorithm, to gridded geophysical and topographical datasets in order to identify and delineate regional-scale exploration targets for carbonatite-alkaline-complex-related REE deposits in northeast India. The study did not utilise interpreted and processed or manually generated data, such as surface or bed-rock geological maps, fault traces, etc., and relies on the algorithm to identify crucial features and delineate prospective areas. The obtained results were then compared with those obtained from a previous supervised knowledge-driven prospectivity analysis. The results were found to be comparable. Therefore, unsupervised machine learning algorithms are reliable tools to automate the manual process of mineral prospectivity modelling and are robust, time-saving alternatives to knowledge-driven or supervised data-driven prospectivity modelling. These methods would be instrumental in unexplored terrains for which there is little or no geological knowledge available. 

How to cite: Aranha, M. and Porwal, A.: Unsupervised machine learning driven Prospectivity analysis of REEs in NE India, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-124,, 2022.

EGU22-654 | Presentations | ITS2.6/AS5.1

On the derivation of data-driven models for partially observed systems 

Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, and Ronan Fablet

When considering the modeling of dynamical systems, the increasing interest in machine learning, artificial intelligence and more generally, data-driven representations, as well as the increasing availability of data, motivated the exploration and definition of new identification techniques. These new data-driven representations aim at solving modern questions regarding the modeling, the prediction and ultimately, the understanding of complex systems such as the ocean, the atmosphere and the climate. 

In this work, we focus on one question regarding the ability to define a (deterministic) dynamical model from a sequence of observations. We focus on sea surface observations and show that these observations typically relate to some, but not all, components of the underlying state space, making the derivation of a deterministic model in the observation space impossible. In this context, we formulate the identification problem as the definition, from data, of an embedding of the observations, parameterized by a differential equation. When compared to state-of-the-art techniques based on delay embedding and linear decomposition of the underlying operators, the proposed approach benefits from all the advances in machine learning and dynamical systems theory in order to define, constrain and tune the reconstructed sate space and the approximate differential equation. Furthermore, the proposed embedding methodology naturally extends to cases in which a dynamical prior (derived for example using physical principals) is known, leading to relevant physics informed data-driven models. 

How to cite: Ouala, S., Chapron, B., Collard, F., Gaultier, L., and Fablet, R.: On the derivation of data-driven models for partially observed systems, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-654,, 2022.

EGU22-1255 | Presentations | ITS2.6/AS5.1

A Deep Learning approach to de-bias Air Quality forecasts, using heterogeneous Open Data sources as reference 

Antonio Pérez, Mario Santa Cruz, Johannes Flemming, and Miha Razinger

The degradation of air quality is a challenge that policy-makers face all over the world. According to the World Health Organisation, air pollution causes an estimate of 7 million premature deaths every year. In this context, air quality forecasts are crucial tools for decision- and policy-makers, to achieve data-informed decisions.

Global forecasts, such as the Copernicus Atmosphere monitoring service model (CAMS), usually exhibit biases: systematic deviations from observations. Adjusting these biases is typically the first step towards obtaining actionable air quality forecasts. It is especially relevant in health-related decisions, when the metrics of interest depend on specific thresholds.

AQ (Air quality) - Bias correction was a project funded by the ECMWF Summer of Weather Code (ESOWC) 2021 whose aim is to improve CAMS model forecasts for air quality variables (NO2, O3, PM2.5), using as a reference the in-situ observations provided by OpenAQ. The adjustment, based on machine learning methods, was performed over a set of specific interesting locations provided by the ECMWF, for the period June 2019 to March 2021.

The machine learning approach uses three different deep learning based models, and an extra neural network that gathers the output of the three previous models. From the three DL-based models, two of them are independent and follow the same structure built upon the InceptionTime module: they use both meteorological and air quality variables, to exploit the temporal variability and to extract the most meaningful features of the past [t-24h, t-23h, … t-1h] and future [t, t+1h, …, t+23h] CAMS predictions. The third model uses the station static attributes (longitude, latitude and elevation), and a multilayer perceptron interacts with the station attributes. The extracted features from these three models are fed into another multilayer perceptron, to predict the upcoming errors with hourly resolution [t, t+1h, …, t+23h]. As a final step, 5 different initializations are considered, assembling them with equal weights to have a more stable regressor.

Previous to the modelisation, CAMS forecasts of air quality variables were actually biassed independently from the location of interest and the variable (on average: biasNO2 = -22.76, biasO3 = 44.30, biasPM2.5 = 12.70). In addition, the skill of the model, measured by the Pearson correlation, did not reach 0.5 for any of the variables—with remarkable low values for NO2 and O3 (on average: pearsonNO2 = 0.10, pearsonO3 = 0.14).

AQ-BiasCorrection modelisation properly corrects these biases. Overall, the number of stations that improve the biases both in train and test sets are: 52 out of 61 (85%) for NO2, 62 out of 67 (92%) for O3, and 80 out of 102 (78%) for PM2.5. Furthermore, the bias improves with declines of -1.1%, -9.7% and -13.9% for NO2, O3 and PM2.5 respectively. In addition, there is an increase in the model skill measured through the Pearson correlation, reaching values in the range of 100-400% for the overall improvement of the variable skill.

How to cite: Pérez, A., Santa Cruz, M., Flemming, J., and Razinger, M.: A Deep Learning approach to de-bias Air Quality forecasts, using heterogeneous Open Data sources as reference, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1255,, 2022.

EGU22-1992 | Presentations | ITS2.6/AS5.1

Approximating downward short-wave radiation flux using all-sky optical imagery using machine learning trained on DASIO dataset. 

Vasilisa Koshkina, Mikhail Krinitskiy, Nikita Anikin, Mikhail Borisov, Natalia Stepanova, and Alexander Osadchiev

Solar radiation is the main source of energy on Earth. Cloud cover is the main physical factor limiting the downward short-wave radiation flux. In modern models of climate and weather forecasts, physical models describing the passage of radiation through clouds may be used. This is a computationally extremely expensive option for estimating downward radiation fluxes. Instead, one may use parameterizations which are simplified schemes for approximating environmental variables. The purpose of this work is to improve the accuracy of the existing parametrizations of downward shortwave radiation fluxes. We solve the problem using various machine learning (ML) models for approximating downward shortwave radiation flux using all-sky optical imagery. We assume that an all-sky photo contains complete information about the downward shortwave radiation. We examine several types of ML models that we trained on dataset of all-sky imagery accompanied by short-wave radiation flux measurements. The Dataset of All-Sky Imagery over the Ocean (DASIO) is collected in Indian, Atlantic and Arctic oceans during several oceanic expeditions from 2014 till 2021. The quality of the best classic ML model is better compared to existing parameterizations known from literature. We will show the results of our study regarding classic ML models as well as the results of an end-to-end ML approach involving convolutional neural networks. Our results allow us to assume one may acquire downward shortwave radiation fluxes directly from all-sky imagery. We will also cover some downsides and limitations of the presented approach.

How to cite: Koshkina, V., Krinitskiy, M., Anikin, N., Borisov, M., Stepanova, N., and Osadchiev, A.: Approximating downward short-wave radiation flux using all-sky optical imagery using machine learning trained on DASIO dataset., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1992,, 2022.

EGU22-2058 | Presentations | ITS2.6/AS5.1

Deep learning for ensemble forecasting 

Rüdiger Brecht and Alexander Bihlo
Ensemble prediction systems are an invaluable tool for weather prediction. Practically, ensemble predictions are obtained by running several perturbed numerical simulations. However, these systems are associated with a high computational cost and often involve statistical post-processing steps to improve their qualities.
Here we propose to use a deep-learning-based algorithm to learn the statistical properties of a given ensemble prediction system, such that this system will not be needed to simulate future ensemble forecasts. This way, the high computational costs of the ensemble prediction system can be avoided while still obtaining the statistical properties from a single deterministic forecast. We show preliminary results where we demonstrate the ensemble prediction properties for a shallow water unstable jet simulation on the sphere. 

How to cite: Brecht, R. and Bihlo, A.: Deep learning for ensemble forecasting, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2058,, 2022.

Numerical weather prediction (NWP) models are currently popularly used for operational weather forecast in meteorological centers. The NWP models describe the flow of fluids by employing a set of governing equations, physical parameterization schemes and initial and boundary conditions. Thus, it often face bias of prediction due to insufficient data assimilation, assumptions or approximations of dynamical and physical processes. To make gridded forecast of rainfall with high confidence, in this study, we present a data-driven deep learning model for correction of rainfall from NWP model, which mainly includes a confidence network and a combinatorial network. Meanwhile, a focal loss is introduced to deal with the characteristics of longtail-distribution of rainfall. It is expected to alleviate the impact of the large span of rainfall magnitude by transferring the regression problem into several binary classification problems. The deep learning model is used to correct the gridded forecasts of rainfall from the European Centre for Medium-Range Weather Forecast Integrated Forecasting System global model (ECMWF-IFS) with a forecast lead time of 24 h to 240 h in Eastern China. First, the rainfall forecast correction problem is treated as an image-to-image translation problem in deep learning under the neural networks. Second, the ECMWF-IFS forecasts and rainfall observations in recent years are used as training, validation, and testing datasets. Finally, the correction performance of the new machine learning model is evaluated and compared to several classical machine learning algorithms. By performing a set of experiments for rainfall forecast error correction, it is found that the new model can effectively forecast rainfall over East China region during the flood season of the year 2020. Experiments also demonstrate that the proposed approach generally performs better in bias correction of rainfall prediction than most of the classical machine learning approaches .

How to cite: Ma, L.: A Deep Learning Bias Correction Approach for Rainfall Numerical Prediction, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2095,, 2022.

EGU22-2893 | Presentations | ITS2.6/AS5.1 | Highlight

Bias Correction of Operational Storm Surge Forecasts Using Neural Networks 

Paulina Tedesco, Jean Rabault, Martin Lilleeng Sætra, Nils Melsom Kristensen, Ole Johan Aarnes, Øyvind Breivik, and Cecilie Mauritzen

Storm surges can give rise to extreme floods in coastal areas. The Norwegian Meteorological Institute (MET Norway) produces 120-hour regional operational storm surge forecasts along the coast of Norway based on the Regional Ocean Modeling System (ROMS). Despite advances in the development of models and computational capability, forecast errors remain large enough to impact response measures and issued alerts, in particular, during the strongest storm events. Reducing these errors will positively impact the efficiency of the warning systems while minimizing efforts and resources spent on mitigation.

Here, we investigate how forecasts can be improved with residual learning, i.e., training data-driven models to predict, and correct, the error in the ROMS output. For this purpose, sea surface height data from stations around Norway were collected and compared with the ROMS output.

We develop two different residual learning frameworks that can be applied on top of the ROMS output. In the first one, we perform binning of the model error, conditionalized by pressure, wind, and waves. Clear error patterns are visible when the error conditioned by the wind is plotted in a polar plot for each station. These error maps can be stored as correction lookup tables to be applied on the ROMS output. However, since wind, pressure, and waves are correlated, we cannot simultaneously correct the error associated with each variable using this method. To overcome this limitation, we develop a second method, which resorts to Neural Networks (NNs) to perform nonlinear modeling of the error pattern obtained at each station. 

The residual NN method strongly outperforms the error map method, and is a promising direction for correcting storm surge models operationally. Indeed, i) this method is applied on top of the existing model and requires no changes to it, ii) all predictors used for NN inference are available operationally, iii) prediction by the NN is very fast, typically a few seconds per station, and iv) the NN correction can be provided to a human expert who gets to inspect it, compare it with the ROMS output, and see how much correction is brought by the NN. Using this NN residual error correction method, the RMS error in the Oslofjord is reduced by typically 7% for lead times of 24 hours, 17% for 48 hours, and 35% for 96 hours.

How to cite: Tedesco, P., Rabault, J., Sætra, M. L., Kristensen, N. M., Aarnes, O. J., Breivik, Ø., and Mauritzen, C.: Bias Correction of Operational Storm Surge Forecasts Using Neural Networks, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2893,, 2022.

EGU22-3977 | Presentations | ITS2.6/AS5.1 | Highlight

Learning quasi-geostrophic turbulence parametrizations from a posteriori metrics 

Hugo Frezat, Julien Le Sommer, Ronan Fablet, Guillaume Balarac, and Redouane Lguensat

Machine learning techniques are now ubiquitous in the geophysical science community. They have been applied in particular to the prediction of subgrid-scale parametrizations using data that describes small scale dynamics from large scale states. However, these models are then used to predict temporal trajectories, which is not covered by this instantaneous mapping. Following the model trajectory during training can be done using an end-to-end approach, where temporal integration is performed using a neural network. As a consequence, the approach is shown to optimize a posteriori metrics, whereas the classical instantaneous training is limited to a priori ones. When applied on a specific energy backscatter problem, found in quasi-geostrophic turbulent flows, the strategy demonstrates long-term stability and high fidelity statistical performance, without any increase in computational complexity during rollout. These improvements may question the future development of realistic subgrid-scale parametrizations in favor of differentiable solvers, required by the a posteriori strategy.

How to cite: Frezat, H., Le Sommer, J., Fablet, R., Balarac, G., and Lguensat, R.: Learning quasi-geostrophic turbulence parametrizations from a posteriori metrics, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3977,, 2022.

EGU22-4062 | Presentations | ITS2.6/AS5.1

Climatological Ocean Surface Wave Projections using Deep Learning 

Peter Mlakar, Davide Bonaldo, Antonio Ricchi, Sandro Carniel, and Matjaž Ličer

We present a numerically cheap machine-learning model which accurately emulates the performances of the surface wave model Simulating WAves Near Shore (SWAN) in the Adriatic basin (north-east Mediterranean Sea).

A ResNet50 inspired deep network architecture with customized spatio-temporal attention layers was used, the network being trained on a 1970-1997 dataset of time-dependent features based on wind fields retrieved from the COSMO-CLM regional climate model (The authors acknowledge Dr. Edoardo Bucchignani (Meteorology Laboratory, Centro Italiano Ricerche Aerospaziali -CIRA-, Capua, Italy), for providing the COSMO-CLM wind fields). SWAN surface wave model outputs for the period of 1970-1997 are used as labels. The period 1998-2000 is used to cross-validate that the network very accurately reproduces SWAN surface wave features (i.e. significant wave height, mean wave period, mean wave direction) at several locations in the Adriatic basin. 

After successful cross validation, a series of projections of ocean surface wave properties based on climate model projections for the end of 21st century (under RCP 8.5 scenario) are performed, and shifts in the emulated wave field properties are discussed.

How to cite: Mlakar, P., Bonaldo, D., Ricchi, A., Carniel, S., and Ličer, M.: Climatological Ocean Surface Wave Projections using Deep Learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4062,, 2022.

EGU22-4493 | Presentations | ITS2.6/AS5.1 | Highlight

Semi-automatic tuning procedure for a GCM targeting continental surfaces: a first experiment using in situ observations 

Maëlle Coulon--Decorzens, Frédérique Cheruy, and Frédéric Hourdin

The tuning or calibration of General Circulation Models (GCMs) is an essential stage for their proper behavior. The need to have the best climate projections in the regions where we live drives the need to tune the models in particular towards the land surface, bearing in mind that the interactions between the atmosphere and the land surface remain a key source of uncertainty in regional-scale climate projections [1].

For a long time, this tuning has been done by hand, based on scientific expertise and has not been sufficiently documented [2]. Recent tuning tools offer the possibility to accelerate climate model development, providing a real tuning formalism as well as a new way to understand climate models. High Tune explorer is one of these statistic tuning tool, involving machine learning and based on uncertainty quantification. It aims to reduce the range of free parameters that allow realistic model behaviour [3]. A new automatic tuning experiment was developed with this tool for the atmospheric component of the IPSL GCM model, LMDZ. It was first tuned at the process level, using several single column test cases compared to large eddies simulations; and then at the global level by targeting radiative metrics at the top of the atmosphere [4].

We propose to add a new step to this semi-automatic tuning procedure targeting atmosphere and land-surface interactions. The first aspect of the proposition is to compare coupled atmosphere-continent simulations (here running LMDZ-ORCHIDEE) with in situ observations from the SIRTA observatory located southwest of Paris. In situ observations provide hourly joint colocated data with a strong potential for the understanding of the processes at stake and their representation in the model. These data are also subject to much lower uncertainties than the satellite inversions with respect to the surface observations. In order to fully benefit from the site observations, the model winds are nudged toward reanalysis. This forces the simulations to follow the effective meteorological sequence, thus allowing the comparison between simulations and observations at the process time scale. The removal of the errors arising from the representation of large-scale dynamics makes the tuning focus on the representation of physical processes «at a given meteorological situation». Finally, the model grid is zoomed in on the SIRTA observatory in order to reduce the computational cost of the simulations while preserving a fine mesh around this observatory.

We show the results of this new tuning step, which succeeds in reducing the domain of acceptable free parameters as well as the dispersion of the simulations. This method, which is less computationally costly than global tuning, is therefore a good way to precondition the latter. It allows the joint tuning of atmospheric and land surface models, traditionally tuned separately [5], and has the advantage of remaining close to the processes and thus improving their understanding.


[1] Cheruy et al., 2014,

[2] Hourdin et al., 2017,

[3] Couvreux et al., 2021,

[4] Hourdin et al., 2021,

[5] Cheruy et al., 2020,

How to cite: Coulon--Decorzens, M., Cheruy, F., and Hourdin, F.: Semi-automatic tuning procedure for a GCM targeting continental surfaces: a first experiment using in situ observations, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4493,, 2022.

EGU22-4923 | Presentations | ITS2.6/AS5.1

Constrained Generative Adversarial Networks for Improving Earth System Model Precipitation 

Philipp Hess, Markus Drüke, Stefan Petri, Felix Strnad, and Niklas Boers

The simulation of precipitation in numerical Earth system models (ESMs) involves various processes on a wide range of scales, requiring high temporal and spatial resolution for realistic simulations. This can lead to biases in computationally efficient ESMs that have a coarse resolution and limited model complexity. Traditionally, these biases are corrected by relating the distributions of historical simulations with observations [1]. While these methods successfully improve the modelled statistics, unrealistic spatial features that require a larger spatial context are not addressed.

Here we apply generative adversarial networks (GANs) [2] to transform precipitation of the CM2Mc-LPJmL ESM [3] into a bias-corrected and more realistic output. Feature attribution shows that the GAN has correctly learned to identify spatial regions with the largest bias during training. Our method presents a general bias correction framework that can be extended to a wider range of ESM variables to create highly realistic but computationally inexpensive simulations of future climates. We also discuss the generalizability of our approach to projections from CMIP6, given that the GAN is only trained on historical data.

[1] A.J. Cannon et al. "Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes?." Journal of Climate 28.17 (2015): 6938-6959.

[2] I. Goodfellow et al. "Generative adversarial nets." Advances in neural information processing systems 27 (2014).

[3] M. Drüke et al. "CM2Mc-LPJmL v1.0: Biophysical coupling of a process-based dynamic vegetation model with managed land to a general circulation model." Geoscientific Model Development 14.6 (2021): 4117--4141.

How to cite: Hess, P., Drüke, M., Petri, S., Strnad, F., and Boers, N.: Constrained Generative Adversarial Networks for Improving Earth System Model Precipitation, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4923,, 2022.

EGU22-5219 | Presentations | ITS2.6/AS5.1 | Highlight

Neural Partial Differential Equations for Atmospheric Dynamics 

Maximilian Gelbrecht and Niklas Boers

When predicting complex systems such as parts of the Earth system, one typically relies on differential equations which can often be incomplete, missing unknown influences or higher order effects. Using the universal differential equations framework, we can augment the equations with artificial neural networks that can compensate these deficiencies. We show that this can be used to predict the dynamics of high-dimensional spatiotemporally chaotic partial differential equations, such as the ones describing atmospheric dynamics. In a first step towards a hybrid atmospheric model, we investigate the Marshall Molteni Quasigeostrophic Model in the form of a Neural Partial Differential Equation. We use it in synthetic examples where parts of the governing equations are replaced with artificial neural networks (ANNs) and demonstrate how the ANNs can recover those terms.

How to cite: Gelbrecht, M. and Boers, N.: Neural Partial Differential Equations for Atmospheric Dynamics, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5219,, 2022.

EGU22-5631 | Presentations | ITS2.6/AS5.1

Autonomous Assessment of Source Area Distributions for Sections in Lagrangian Particle Release Experiments 

Carola Trahms, Patricia Handmann, Willi Rath, Matthias Renz, and Martin Visbeck

Lagrangian experiments for particle tracing in atmosphere or ocean models and their analysis are a cornerstone of earth-system studies. They cover diverse study objectives such as the identification of pathways or source regions. Data for Lagrangian studies are generated by releasing virtual particles in one or in multiple locations of interest and simulating their advective-diffusive behavior backwards or forwards in time. Identifying main pathways connecting two regions of interest is often done by counting the trajectories that reach both regions. Here, the exact source and target region must be defined manually by a researcher. Manually defining the importance and exact location of these regions introduces a highly subjective perspective into the analysis. Additionally, to investigate all major target regions, all of them must be defined manually and the data must be analyzed accordingly. This human element slows down and complicates large scale analyses with many different sections and possible source areas.

We propose to significantly reduce the manual aspect by automatizing this process. To this end, we combine methods from different areas of machine learning and pattern mining into a sequence of steps. First, unsupervised methods, i.e., clustering, identify possible source areas on a randomized subset of the data. In a successive second step, supervised learning, i.e., classification, labels the positions along the trajectories according to their most probable source area using the previously automatically identified clusters as labels. The results of this approach can then be compared quantitatively to the results of analyses with manual definition of source areas and border-hitting-based labeling of the trajectories. Preliminary findings suggest that this approach could indeed help greatly to objectify and fasten the analysis process for Lagrangian Particle Release Experiments.

How to cite: Trahms, C., Handmann, P., Rath, W., Renz, M., and Visbeck, M.: Autonomous Assessment of Source Area Distributions for Sections in Lagrangian Particle Release Experiments, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5631,, 2022.

EGU22-5632 | Presentations | ITS2.6/AS5.1

Data-Driven Sentinel-2 Based Deep Feature Extraction to Improve Insect Species Distribution Models 

Joe Phillips, Ce Zhang, Bryan Williams, and Susan Jarvis

Despite being a vital part of ecosystems, insects are dying out at unprecedented rates across the globe. To help address this in the UK, UK Centre for Ecology & Hydrology (UKCEH) are creating a tool to utilise insect species distribution models (SDMs) for better facilitating future conservation efforts via volunteer-led insect tracking procedures. Based on these SDM models, we explored the inclusion of additional covariate information via 10-20m2 bands of temporally-aggregated Sentinel-2 data taken over the North of England in 2017 to improve the predictive performance. Here, we matched the 10-20m2 resolution of the satellite data to the coarse 1002 insect observation data via four methodologies of increasing complexity. First, we considered standard pixel-based approaches, performing aggregation by taking both the mean and standard deviation over the 10m2 pixels. Second, we explored object-based approaches to address the modifiable areal unit problem by applying the SNIC superpixels algorithm over the extent, with the mean and standard deviation of the pixels taken within each segment. The resulting dataset was then re-projected to a resolution of 100m2 by taking the modal values of the 10m2 pixels, which were provided with the aggregated values of their parent segment. Third, we took the UKCEH-created 2017 Land Cover Map (LCM) dataset and sampled 42,000, random 100m2 areas, evenly distributed about their modal land cover classes. We trained the U-Net Deep Learning model using the Sentinel-2 satellite images and LCM classes, by which data-driven features were extracted from the network over each 100m2 extent. Finally, as with the second approach, we used the superpixels segments instead as the units of analysis, sampling 21,000 segments, and taking the smallest bounding box around each of them. An attention-based U-Net was then adopted to mask each of the segments from their background and extract deep features. In a similar fashion to the second approach, we then re-projected the resulting dataset to a resolution of 100m2, taking the modal segment values accordingly. Using cross-validated AUCs over various species of moths and butterflies, we found that the object-based deep learning approach achieved the best accuracy when used with the SDMs. As such, we conclude that the novel approach of spatially aggregating satellite data via object-based, deep feature extraction has the potential to benefit similar, model-based aggregation needs and catalyse a step-change in ecological and environmental applications in the future.

How to cite: Phillips, J., Zhang, C., Williams, B., and Jarvis, S.: Data-Driven Sentinel-2 Based Deep Feature Extraction to Improve Insect Species Distribution Models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5632,, 2022.

EGU22-5681 | Presentations | ITS2.6/AS5.1

AtmoDist as a new pathway towards quantifying and understanding atmospheric predictability 

Sebastian Hoffmann, Yi Deng, and Christian Lessig

The predictability of the atmosphere is a classical problem that has received much attention from both a theoretical and practical point of view. In this work, we propose to use a purely data-driven method based on a neural network to revisit the problem. The analysis is built upon the recently introduced AtmoDist network that has been trained on high-resolution reanalysis data to provide a probabilistic estimate of the temporal difference between given atmospheric fields, represented by vorticity and divergence. We define the skill of the network for this task as a new measure of atmospheric predictability, hypothesizing that the prediction of the temporal differences by the network will be more susceptible to errors when the atmospheric state is intrinsically less predictable. Preliminary results show that for short timescales (3-48 hours) one sees enhanced predictability in warm season compared to cool season over northern midlatitudes, and lower predictability over ocean compared to land. These findings support the hypothesis that across short timescales, AtmoDist relies on the recurrences of mesoscale convection with coherent spatiotemporal structures to connect spatial evolutions to temporal differences. For example, the prevalence of mesoscale convective systems (MCSs) over the central US in boreal warm season can explain the increase of mesoscale predictability there and oceanic zones marked by greater predictability corresponds well to regions of elevated convective activity such as the Pacific ITCZ. Given the dependence of atmospheric predictability on geographic location, season, and most importantly, timescales, we further apply the method to synoptic scales (2-10 days), where excitation and propagation of large-scale disturbances such as Rossby wave packets are expected to provide the connection between temporal and spatial differences. The design of the AtmoDist network is thereby adapted to the prediction range, for example, the size of the local patches that serve as input to AtmoDist is chosen based on the spatiotemporal atmospheric scales that provide the expected time and space connections.

By providing to the community a powerful, purely data-driven technique for quantifying, evaluating, and interpreting predictability, our work lays the foundation for efficiently detecting the existence of sub-seasonal to seasonal (S2S) predictability and, by further analyzing the mechanism of AtmoDist, understanding the physical origins, which bears major scientific and socioeconomic significances.

How to cite: Hoffmann, S., Deng, Y., and Lessig, C.: AtmoDist as a new pathway towards quantifying and understanding atmospheric predictability, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5681,, 2022.

EGU22-5746 | Presentations | ITS2.6/AS5.1

Model Output Statistics (MOS) and Machine Learning applied to CAMS O3 forecasts: trade-offs between continuous and categorical skill scores 

Hervé Petetin, Dene Bowdalo, Pierre-Antoine Bretonnière, Marc Guevara, Oriol Jorba, Jan Mateu armengol, Margarida Samso Cabre, Kim Serradell, Albert Soret, and Carlos Pérez García-Pando

Air quality (AQ) forecasting systems are usually built upon physics-based numerical models that are affected by a number of uncertainty sources. In order to reduce forecast errors, first and foremost the bias, they are often coupled with Model Output Statistics (MOS) modules. MOS methods are statistical techniques used to correct raw forecasts at surface monitoring station locations, where AQ observations are available. In this study, we investigate to what extent AQ forecasts can be improved using a variety of MOS methods, including persistence (PERS), moving average (MA), quantile mapping (QM), Kalman Filter (KF), analogs (AN), and gradient boosting machine (GBM). We apply our analysis to the Copernicus Atmospheric Monitoring Service (CAMS) regional ensemble median O3 forecasts over the Iberian Peninsula during 2018–2019. A key aspect of our study is the evaluation, which is performed using a very comprehensive set of continuous and categorical metrics at various time scales (hourly to daily), along different lead times (1 to 4 days), and using different meteorological input data (forecast vs reanalyzed).

Our results show that O3 forecasts can be substantially improved using such MOS corrections and that this improvement goes much beyond the correction of the systematic bias. Although it typically affects all lead times, some MOS methods appear more adversely impacted by the lead time. When considering MOS methods relying on meteorological information and comparing the results obtained with IFS forecasts and ERA5 reanalysis, the relative deterioration brought by the use of IFS is minor, which paves the way for their use in operational MOS applications. Importantly, our results also clearly show the trade-offs between continuous and categorical skills and their dependencies on the MOS method. The most sophisticated MOS methods better reproduce O3 mixing ratios overall, with lowest errors and highest correlations. However, they are not necessarily the best in predicting the highest O3 episodes, for which simpler MOS methods can give better results. Although the complex impact of MOS methods on the distribution and variability of raw forecasts can only be comprehended through an extended set of complementary statistical metrics, our study shows that optimally implementing MOS in AQ forecast systems crucially requires selecting the appropriate skill score to be optimized for the forecast application of interest.

Petetin, H., Bowdalo, D., Bretonnière, P.-A., Guevara, M., Jorba, O., Armengol, J. M., Samso Cabre, M., Serradell, K., Soret, A., and Pérez Garcia-Pando, C.: Model Output Statistics (MOS) applied to CAMS O3 forecasts: trade-offs between continuous and categorical skill scores, Atmos. Chem. Phys. Discuss. [preprint],, in review, 2021.

How to cite: Petetin, H., Bowdalo, D., Bretonnière, P.-A., Guevara, M., Jorba, O., Mateu armengol, J., Samso Cabre, M., Serradell, K., Soret, A., and Pérez García-Pando, C.: Model Output Statistics (MOS) and Machine Learning applied to CAMS O3 forecasts: trade-offs between continuous and categorical skill scores, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5746,, 2022.

With the goal of developing a data-driven parameterization of unresolved gravity waves (GW) momentum transport for use in general circulation models (GCMs), we investigate neural network architectures that emulate the Alexander-Dunkerton 1999 (AD99) scheme, an existing physics-based GW parameterization. We analyze the distribution of errors as functions of shear-related metrics in an effort to diagnose the disparity between online and offline performance of the trained emulators, and develop a sampling algorithm to treat biases on the tails of the distribution without adversely impacting mean performance. 

It has been shown in previous efforts [1] that stellar offline performance does not necessarily guarantee adequate online performance, or even stability. Error analysis reveals that the majority of the samples are learned quickly, while some stubborn samples remain poorly represented. We find that the more error-prone samples are those with wind profiles that have large shears– this is consistent with physical intuition as gravity waves encounter a wider range of critical levels when experiencing large shear;  therefore parameterizing gravity waves for these samples is a more difficult, complex task. To remedy this, we develop a sampling strategy that performs a parameterized histogram equalization, a concept borrowed from 1D optimal transport. 

The sampling algorithm uses a linear mapping from the original histogram to a more uniform histogram parameterized by $t \in [0,1]$, where $t=0$ recovers the original distribution and $t=1$ enforces a completely uniform distribution. A given value $t$ assigns each bin a new probability which we then use to sample from each bin. If the new probability is smaller than the original, then we invoke sampling without replacement, but limited to a reduced number consistent with the new probability. If the new probability is larger than the original, then we repeat all the samples in the bin up to some predetermined maximum repeat value (a threshold to avoid extreme oversampling at the tails). We optimize this sampling algorithm with respect to $t$, the maximum repeat value, and the number and distribution (uniform or not) of the histogram bins. The ideal combination of those parameters yields errors that are closer to a constant function of the shear metrics while maintaining high accuracy over the whole dataset. Although we study the performance of this algorithm in the context of training a gravity wave parameterization emulator, this strategy can be used for learning datasets with long tail distributions where the rare samples are associated with low accuracy. Instances of this type of datasets are prevalent in earth system dynamics: launching of gravity waves, and extreme events like hurricanes, heat waves are just a few examples. 

[1] Espinosa, Z. I., A. Sheshadri, G. R. Cain, E. P. Gerber, and K. J. DallaSanta, 2021: A Deep Learning Parameterization of Gravity Wave Drag Coupled to an Atmospheric Global Climate Model,Geophys. Res. Lett., in review. []

How to cite: Yang, L. and Gerber, E.: Sampling strategies for data-driven parameterization of gravity wave momentum transport, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5766,, 2022.

EGU22-5980 | Presentations | ITS2.6/AS5.1 | Highlight

Probabilistic forecasting of heat waves with deep learning 

George Miloshevich, Valerian Jacques-Dumas, Pierre Borgnat, Patrice Abry, and Freddy Bouchet
Extreme events such as storms, floods, cold spells and heat waves are expected to have an increasing societal impact with climate change. However the study of rare events is complicated due to computational costs of highly complex models and lack of observations. However, with the help of machine learning synthetic models for forecasting can be constructed and cheaper resampling techniques can be developed. Consequently, this may also clarify more regional impacts of climate change. .

In this work, we perform detailed analysis of how deep neural networks (DNNs) can be used in intermediate-range forecasting of prolonged heat waves of duration of several weeks over synoptic spatial scales. In particular, we train a convolutional neural network (CNN) on the 7200 years of a simulation of a climate model. As such, we are interested in probabilistic prediction (committor function in transition theory). Thus we discuss the proper forecasting scores such as Brier skill score, which is popular in weather prediction, and cross-entropy skill, which is based on information-theoretic considerations. They allow us to measure the success of various architectures and investigate more efficient pipelines to extract the predictions from physical observables such as geopotential, temperature and soil moisture. A priori, the committor is hard to visualize as it is a high dimensional function of its inputs, the grid points of the climate model for a given field. Fortunately, we can construct composite maps conditioned to its values which reveal that the CNN is likely relying on the global teleconnection patterns of geopotential. On the other hand, soil moisture signal is more localized with predictive capability over much longer times in future (at least a month). The latter fact relates to the soil-atmosphere interactions. One expects the performance of DNNs to greatly improve with more data. We provide quantitative assessment of this fact. In addition, we offer more details on how the undersampling of negative events affects the knowledge of the committor function. We show that transfer learning helps ensure that the committor is a smooth function along the trajectory. This will be an important quality when such a committor will be applied in rare event algorithms for importance sampling. 
While DNNs are universal function approximators the issue of extrapolation can be somewhat problematic. In addressing this question we train a CNN on a dataset generated from a simulation without a diurnal cycle, where the feedbacks between soil moisture and heat waves appear to be significantly stronger. Nevertheless, when the CNN with the given weights is validated on a dataset generated from a simulation with a daily cycle the predictions seem to generalize relatively well, despite a small reduction in skill. This generality validates the approach. 

How to cite: Miloshevich, G., Jacques-Dumas, V., Borgnat, P., Abry, P., and Bouchet, F.: Probabilistic forecasting of heat waves with deep learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5980,, 2022.

EGU22-6479 | Presentations | ITS2.6/AS5.1

Parameter inference and uncertainty quantification for an intermediate complexity climate model 

Benedict Roeder, Jakob Schloer, and Bedartha Goswami

Well-adapted parameters in climate models are essential to make accurate predictions
for future projections. In climate science, the record of precise and comprehensive obser-
vational data is rather short and parameters of climate models are often hand-tuned or
learned from artificially generated data. Due to limited and noisy data, one wants to use
Bayesian models to have access to uncertainties of the inferred parameters. Most popu-
lar algorithms for learning parameters from observational data like the Kalman inversion
approach only provide point estimates of parameters.
In this work, we compare two Bayesian parameter inference approaches applied to the
intermediate complexity model for the El Niño-Southern Oscillation by Zebiak & Cane. i)
The "Calibrate, Emulate, Sample" (CES) approach, an extension of the ensemble Kalman
inversion which allows posterior inference by emulating the model via Gaussian Processes
and thereby enables efficient sampling. ii) The simulation-based inference (SBI) approach
where the approximate posterior distribution is learned from simulated model data and
observational data using neural networks.
We evaluate the performance of both approaches by comparing their run times and the
number of required model evaluations, assess the scalability with respect to the number
of inference parameters, and examine their posterior distributions.

How to cite: Roeder, B., Schloer, J., and Goswami, B.: Parameter inference and uncertainty quantification for an intermediate complexity climate model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6479,, 2022.

EGU22-6553 | Presentations | ITS2.6/AS5.1

Can simple machine learning methods predict concentrations of OH better than state of the art chemical mechanisms? 

Sebastian Hickman, Paul Griffiths, James Weber, and Alex Archibald

Concentrations of the hydroxyl radical, OH, control the lifetime of methane, carbon monoxide and other atmospheric constituents.  The short lifetime of OH, coupled with the spatial and temporal variability in its sources and sinks, makes accurate simulation of its concentration particularly challenging. To date, machine learning (ML) methods have been infrequently applied to global studies of atmospheric chemistry.

We present an assessment of the use of ML methods for the challenging case of simulation of the hydroxyl radical at the global scale, and show that several approaches are indeed viable.  We use observational data from the recent NASA Atmospheric Tomography Mission to show that machine learning methods are comparable in skill to state of the art forward chemical models and are capable, if appropriately applied, of simulating OH to within observational uncertainty.  

We show that a simple ridge regression model is a better predictor of OH concentrations in the remote atmosphere than a state of the art chemical mechanism implemented in a forward box model. Our work shows that machine learning may be an accurate emulator of chemical concentrations in atmospheric chemistry, which would allow a significant speed up in climate model runtime due to the speed and efficiency of simple machine learning methods. Furthermore, we show that relatively few predictors are required to simulate OH concentrations, suggesting that the variability in OH can be quantitatively accounted for by few observables with the potential to simplify the numerical simulation of atmospheric levels of key species such as methane. 

How to cite: Hickman, S., Griffiths, P., Weber, J., and Archibald, A.: Can simple machine learning methods predict concentrations of OH better than state of the art chemical mechanisms?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6553,, 2022.

EGU22-6674 | Presentations | ITS2.6/AS5.1

The gravity wave parameterization calibration problem: A 1D QBO model testbed 

Ofer Shamir, L. Minah Yang, David S. Connelly, and Edwin P. Gerber

An essential step in implementing any new parameterization is calibration, where the parameterization is adjusted to work with an existing model and yield some desired improvement. In the context of gravity wave (GW) momentum transport, calibration is necessitated by the facts that: (i) Some GWs are always at least partially resolved by the model, and hence a parameterization should only account for the missing waves. Worse, the parameterization may need to correct for the misrepresentation of under-resolved GWs, i.e., coarse vertical resolution can bias GW breaking level, leading to erroneous momentum forcing. (ii) The parameterized waves depend on the resolved solution for both their sources and dissipation, making them susceptible to model biases. Even a "perfect" parameterization could then yield an undesirable result, e.g., an unrealistic Quasi-Biennial Oscillation (QBO).  While model-specific calibration is required, one would like a general "recipe" suitable for most models. From a practical point of view, the adoption of a new parameterization will be hindered by a too-demanding calibration process. This issue is of particular concern in the context of data-driven methods, where the number of tunable degrees of freedom is large (possibly in the millions). Thus, more judicious ways for addressing the calibration step are required. 

To address the above issues, we develop a 1D QBO model, where the "true" gravity wave momentum deposition is determined from a source distribution and critical level breaking, akin to a traditional physics-based GW parameterization. The control parameters associated with the source consist of the total wave flux (related to the total precipitation for convectively generated waves) and the spectrum width (related to the depth of convection). These parameters can be varied to mimic the variability in GW sources between different models, i.e., biases in precipitation variability. In addition, the model’s explicit diffusivity and vertical advection can be varied to mimic biases in model numerics and circulation, respectively. The model thus allows us to assess the ability of a data-driven parameterization to (i) extrapolate, capturing the response of GW momentum transport to a change in the model parameters and (ii) be calibrated, adjusted to maintain the desired simulation of the QBO in response to a change in the model parameters. The first property is essential for a parameterization to be used for climate prediction, the second, for a parameterization to be used at all. We focus in particular on emulators of the GW momentum transport based on neural network and regression trees, contrasting their ability to satisfy both of these goals.  


How to cite: Shamir, O., Yang, L. M., Connelly, D. S., and Gerber, E. P.: The gravity wave parameterization calibration problem: A 1D QBO model testbed, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6674,, 2022.

All oceanic general circulation models (GCMs) include parametrizations of the unresolved subgrid-scale (eddy) effects on the large-scale motions, even at the (so-called) eddy-permitting resolutions. Among the many problems associated with the development of accurate and efficient eddy parametrizations, one problem is a reliable decomposition of a turbulent flow into resolved and unresolved (subgrid) scale components. Finding an objective way to separate eddies is a fundamental, critically important and unresolved problem. 
Here a statistically consistent correlation-based flow decomposition method (CBD) that employs the Gaussian filtering kernel with geographically varying topology – consistent with the observed local spatial correlations – achieves the desired scale separation. CBD is demonstrated for an eddy-resolving solution of the classical midlatitude double-gyre quasigeostrophic (QG) circulation, that possess two asymmetric gyres of opposite circulations and a strong meandering eastward jet, such as the Gulf Stream in the North Atlantic and Kuroshio in the North Pacific. CBD facilitates a comprehensive analysis of the feedbacks of eddies on the large-scale flow via the transient part of the eddy forcing. A  `product integral' based on time-lagged correlation between the diagnosed eddy forcing and the evolving large-scale flow, uncovers robust `eddy backscatter' mechanism. Data-driven augmentation of non-eddy-resolving ocean model by stochastically-emulated eddy fields allows to restore the missing eddy-driven features, such as the merging western boundary currents, their eastward extension and low-frequency variabilities of gyres.

  • N. Argawal, Ryzhov, E.A., Kondrashov, D., and P.S. Berloff, 2021: Correlation-based flow decomposition and statistical analysis of the eddy forcing, Journal of Fluid Mechanics, 924, A5. doi:10.1017/jfm.2021.604

  • N. Argawal, Kondrashov, D., Dueben, P., Ryzhov, E.A., and P.S. Berloff, 2021: A comparison of data-driven approaches to build low-dimensional ocean modelsJournal of Advances in Modelling Earth Systems, doi:10.1029/2021MS002537


How to cite: Kondrashov, D.: Towards physics-informed stochastic parametrizations of subgrid physics in ocean models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6859,, 2022.

EGU22-7044 | Presentations | ITS2.6/AS5.1

Seismic Event Characterization using Manifold Learning Methods 

Yuri Bregman, Yochai Ben Horin, Yael Radzyner, Itay Niv, Maayan Kahlon, and Neta Rabin

Manifold learning is a branch of machine learning that focuses on compactly representing complex data-sets based on their fundamental intrinsic parameters. One such method is diffusion maps, which reduces the dimension of the data while preserving its geometric structure. In this work, diffusion maps are applied to several seismic event characterization tasks. The first task is automatic earthquake-explosion discrimination, which is an essential component of nuclear test monitoring. We also use this technique to automatically identify mine explosions and aftershocks following large earthquakes. Identification of such events helps to lighten the analysts’ burden and allow for timely production of reviewed seismic bulletins.

The proposed methods begin with a pre-processing stage in which a time–frequency representation is extracted from each seismogram while capturing common properties of seismic events and overcoming magnitude differences. Then, diffusion maps are used in order to construct a low-dimensional model of the original data. In this new low-dimensional space, classification analysis is carried out.

The algorithm’s discrimination performance is demonstrated on several seismic data sets. For instance, using the seismograms from EIL station, we identify arrivals that were caused by explosions at the nearby Eshidiya mine in Jordan. The model provides a visualization of the data, organized by its intrinsic factors. Thus, along with the discrimination results, we provide a compact organization of the data that characterizes the activity patterns in the mine.

Our results demonstrate the potential and strength of the manifold learning based approach, which may be suitable to other in other geophysics domains.

How to cite: Bregman, Y., Ben Horin, Y., Radzyner, Y., Niv, I., Kahlon, M., and Rabin, N.: Seismic Event Characterization using Manifold Learning Methods, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7044,, 2022.

Accurate streamflow forecasts can provide guidance for reservoir managements, which can regulate river flows, manage water resources and mitigate flood damages. One popular way to forecast streamflow is to use bias-corrected meteorological forecasts to drive a calibrated hydrological model. But for cascade reservoirs, such approaches suffer significant deficiencies because of the difficulty to simulate reservoir operations by physical approach and the uncertainty of meteorological forecasts over small catchment. Another popular way is to forecast streamflow with machine learning method, which can fit a statistical model without inputs like reservoir operating rules. Thus, we integrate meteorological forecasts, land surface hydrological model and machine learning to forecast hourly streamflow over the Yantan catchment, which is one of the cascade reservoirs in the Hongshui River with streamflow influenced by both the upstream reservoir water release and the rainfall runoff process within the catchment.

Before evaluating the streamflow forecast system, it is necessary to investigate the skill by means of a series of specific hindcasts that isolate potential sources of predictability, like meteorological forcing and the initial condition (IC). Here, we use ensemble streamflow prediction (ESP)/reverse ESP (revESP) method to explore the impact of IC on hourly stream prediction. Results show that the effect of IC on runoff prediction is 16 hours. In the next step, we evaluate the hourly streamflow hindcasts during the rainy seasons of 2013-2017 performed by the forecast system. We use European Centre for Medium-Range Weather Forecasts perturbed forecast forcing from the THORPEX Interactive Grand Global Ensemble (TIGGE-ECMWF) as meteorological inputs to perform the hourly streamflow hindcasts. Compared with the ESP, the hydrometeorological ensemble forecast approach reduces probabilistic and deterministic forecast errors by 6% during the first 7 days. After integrated the long short-term memory (LSTM) deep learning method into the system, the deterministic forecast error can be further reduced by 6% in the first 72 hours. We also use historically observed streamflow to drive another LSTM model to perform an LSTM-only streamflow forecast. Results show that its skill sharply dropped after the first 24 hours, which indicates that the meteorology-hydrology modeling approach can improve the streamflow forecast.

How to cite: Liu, J. and Yuan, X.: Reservoir inflow forecast by combining meteorological ensemble forecast, physical hydrological simulation and machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7093,, 2022.

EGU22-7113 | Presentations | ITS2.6/AS5.1 | Highlight

Coupling regional air quality simulations of EURAD-IM with street canyon observations - a machine learning approach 

Charlotte Neubacher, Philipp Franke, Alexander Heinlein, Axel Klawonn, Astrid Kiendler-Scharr, and Anne-Caroline Lange

State of the art atmospheric chemistry transport models on regional scales as the EURAD-IM (EURopean Air pollution Dispersion-Inverse Model) simulate physical and chemical processes in the atmosphere to predict the dispersion of air pollutants. With EURAD-IM’s 4D-var data assimilation application, detailed analyses of the air quality can be conducted. These analyses allow for improvements of atmospheric chemistry forecast as well as emission source strength assessments. Simulations of EURAD-IM can be nested to a spatial resolution of 1 km, which does not correspond to the urban scale. Thus, inner city street canyon observations cannot be exploited since here, anthropogenic pollution vary vastly over scales of 100 m or less.

We address this issue by implementing a machine learning (ML) module into EURAD-IM, forming a hybrid model that enable bridging the representativeness gap between model resolution and inner-city observations. Thus, the data assimilation of EURAD-IM is strengthened by additional observations in urban regions. Our approach of the ML module is based on a neural network (NN) with relevant environmental information of street architecture, traffic density, meteorology, and atmospheric pollutant concentrations from EURAD-IM as well as the street canyon observation of pollutants as input features. The NN then maps the observed concentration from street canyon scale to larger spatial scales.

We are currently working with a fully controllable test environment created from EURAD-IM forecasts of the years 2020 and 2021 at different spatial resolutions. Here, the ML model maps the high-resolution hourly NO2 concentration to the concentration of the low resolution model grid. It turns out that it is very difficult for NNs to learn the hourly concentrations with equal accuracy using diurnal cycles of pollutant concentrations. Thus, we develop a model that uses an independent NN for each hour to support time-of-day learning. This allows to reduce the training error by a factor of 102. As a proof of concept, we trained the ML model in an overfitting regime where the mean squared training error reduce to 0.001% for each hour. Furthermore, by optimizing the hyperparameters and introducing regularization terms to reduce the overfitting, we achieved a validation error of 9−12% during night and 9−16% during day.

How to cite: Neubacher, C., Franke, P., Heinlein, A., Klawonn, A., Kiendler-Scharr, A., and Lange, A.-C.: Coupling regional air quality simulations of EURAD-IM with street canyon observations - a machine learning approach, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7113,, 2022.

EGU22-7135 | Presentations | ITS2.6/AS5.1 | Highlight

How to calibrate a climate model with neural network based physics? 

Blanka Balogh, David Saint-Martin, and Aurélien Ribes

Unlike the traditional subgrid scale parameterizations used in climate models, current neural network (NN) parameterizations are only tuned offline, by minimizing a loss function on outputs from high resolution models. This approach often leads to numerical instabilities and long-term biases. Here, we propose a method to design tunable NN parameterizations and calibrate them online. The calibration of the NN parameterization is achieved in two steps. First, some model parameters are included within the NN model input. This NN model is fitted at once for a range of values of the parameters, using an offline metric. Second, once the NN parameterization has been plugged into the climate model, the parameters included among the NN inputs are optimized with respect to an online metric quantifying errors on long-term statistics. We illustrate our method with two simple dynamical systems. Our approach significantly reduces long-term biases of the climate model with NN based physics.

How to cite: Balogh, B., Saint-Martin, D., and Ribes, A.: How to calibrate a climate model with neural network based physics?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7135,, 2022.

EGU22-8279 | Presentations | ITS2.6/AS5.1

Using deep learning to improve the spatial resolution of the ocean model 

Ihor Hromov, Georgy Shapiro, Jose Ondina, Sanjay Sharma, and Diego Bruciaferri

For the ocean models, the increase of spatial resolution is a matter of significant importance and thorough research. Computational resources limit our capabilities of the increase in model resolution. This constraint is especially true for the traditional dynamical models, for which an increase of a factor of two in the horizontal resolution results in simulation times increased approximately tenfold. One of the potential methods to relax this limitation is to use Artificial Intelligence methods, such as Neural Networks (NN). In this research, NN is applied to ocean circulation modelling. More specifically, NN is used on data output from the dynamical model to increase the spatial resolution of the model output. The main dataset being used is Sea Surface Temperature data in 0.05- and 0.02-degree horizontal resolutions for Irish Sea. 

Several NN architectures were applied to address the task. Generative Adversarial Networks (GAN), Convolutional Neural Networks (CNN) and Multi-level Wavelet CNN. They are used in other areas of knowledge in problems related to the increase of resolution. The work will contrast and compare the efficiency of and present a provisional assessment of the efficiency of each of the methods. 

How to cite: Hromov, I., Shapiro, G., Ondina, J., Sharma, S., and Bruciaferri, D.: Using deep learning to improve the spatial resolution of the ocean model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8279,, 2022.

EGU22-8334 | Presentations | ITS2.6/AS5.1

Information theory solution approach for air-pollution sensors' location-allocation problem 

Barak Fishbain, Ziv Mano, and Shai Kendler

Urbanization and industrialization processes are accompanied by adverse environmental effects, such as air pollution. The first action in reducing air pollution is the detection of its source(s). This is achievable through monitoring. When deploying a sensor array, one must balance between the array's cost and performance. This optimization problem is known as the location-allocation problem. Here, a new solution approach, which draws its foundation from information theory is presented. The core of the method is air-pollution levels computed by a dispersion model in various meteorological conditions. The sensors are then placed in the locations which information theory identifies as the most uncertain. The method is compared with two other heuristics typically applied for solving the location-allocation problem. In the first, sensors are randomly deployed, in the second, the sensors are placed according to the maximal cumulative pollution levels (i.e., hot spot). For the comparison two simulated scenes were evaluated, one contains point sources and buildings, and the other also contains line sources (i.e., roads). It shows that the Entropy method resulted in a superior sensors' deployment compared to the other two approaches in terms of source apportionment and dense pollution field reconstruction from the sensors' network measurements.

How to cite: Fishbain, B., Mano, Z., and Kendler, S.: Information theory solution approach for air-pollution sensors' location-allocation problem, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8334,, 2022.

EGU22-8719 | Presentations | ITS2.6/AS5.1

Multi-station Multivariate Multi-step Convection Nowcasting with Deep Neural Networks 

Sandy Chkeir, Aikaterini Anesiadou, and Riccardo Biondi

Extreme weather nowcasting has always been a challenging task in meteorology. Many research studies have been conducted to accurately forecast extreme weather events, related to rain rates and/or wind speed thresholds, in spatio-temporal scales. Over decades, this field gained attention in the artificial intelligence community which is aiming towards creating more accurate models using the latest algorithms and methods.  

In this work, within the H2020 SESAR ALARM project, we aim to nowcast rain and wind speed as target features using different input configurations of the available sources such as weather stations, lightning detectors, radar, GNSS receivers, radiosonde and radio occultations data. This nowcasting task has been firstly conducted at 14 local stations around Milano Malpensa Airport as a short-term temporal multi-step forecasting. At a second step, all stations will be combined, meaning that the forecasting becomes a spatio-temporal problem. Concretely, we want to investigate the predicted rain and wind speed values using the different inputs for two case scenarios: for each station, and joining all stations together. 

The chaotic nature of the atmosphere, e.g. non-stationarity of the driving series of each weather feature, makes the predictions unreliable and inaccurate and thus dealing with these data is a very delicate task. For this reason, we have devoted some work to cleaning, feature engineering and preparing the raw data before feeding them into the model architectures. We have managed to preprocess large amounts of data for local stations around the airport, and studied the feasibility of nowcasting rain and wind speed targets using different data sources altogether. The temporal multivariate driving series have high dimensionality and we’ve  made multi-step predictions for the defined target functions.

We study and test different machine learning architectures starting from simple multi-layer perceptrons to convolutional models, and Recurrent Neural Networks (RNN) for temporal and spatio-temporal nowcasting. The Long Short-Term Memory (LSTM) encoder decoder architecture outperforms other models achieving more accurate predictions for each station separately.  Furthermore, to predict the targets in a spatio-temporal scale, we will deploy a 2-layer spatio-temporal stacked LSTM model consisting of independent LSTM models per location in the first LSTM layer, and another LSTM layer to finally predict targets for multi-steps ahead. And the results obtained with different algorithm architectures applied to a dense network of sensors are to be reported.

How to cite: Chkeir, S., Anesiadou, A., and Biondi, R.: Multi-station Multivariate Multi-step Convection Nowcasting with Deep Neural Networks, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8719,, 2022.

EGU22-8852 | Presentations | ITS2.6/AS5.1

Time-dependent Hillshades: Dispelling the Shadow Curse of Machine Learning Applications in Earth Observation 

Freddie Kalaitzis, Gonzalo Mateo-Garcia, Kevin Dobbs, Dolores Garcia, Jason Stoker, and Giovanni Marchisio

We show that machine learning models learn and perform better when they know where to expect shadows, through hillshades modeled to the time of imagery acquisition.

Shadows are detrimental to all machine learning applications on satellite imagery. Prediction tasks like semantic / instance segmentation, object detection, counting of rivers, roads, buildings, trees, all rely on crisp edges and colour gradients that are confounded by the presence of shadows in passive optical imagery, which rely on the sun’s illumination for reflectance values.

Hillshading is a standard technique for enriching a mapped terrain with relief effects, which is done by emulating the shadow caused by steep terrain and/or tall vegetation. A hillshade that is modeled to the time of day and year can be easily derived through a basic form of ray tracing on a Digital Terrain Model (DTM) (also known as a bare-earth DEM) or Digital Surface Model (DSM) given the sun's altitude and azimuth angles. In this work, we use lidar-derived DSMs. A DSM-based hillshade conveys a lot more information on shadows than a bare-earth DEM alone, namely any non-terrain vertical features (e.g. vegetation, buildings) resolvable at a 1-m resolution. The use of this level of fidelity of DSM for hillshading and its input to a machine learning model is novel and the main contribution of our work. Any uncertainty over the angles can be captured through a composite multi-angle hillshade, which shows the range where shadows can appear throughout the day.

We show the utility of time-dependent hillshades in the daily mapping of rivers from Very High Resolution (VHR) passive optical and lidar-derived terrain data [1]. Specifically, we leverage the acquisition timestamps within a daily 3m PlanetScope product over a 2-year period. Given a datetime and geolocation, we model the sun’s azimuth and elevation relative to that geolocation at that time of day and year. We can then generate a time-dependent hillshade and therefore locate shadows in any given time within that 2-year period. In our ablation study we show that, out of all the lidar-derived products, the time-dependent hillshades contribute a 8-9% accuracy improvement in the semantic segmentation of rivers. This indicates that a semantic segmentation machine learning model is less prone to errors of commission (false positives), by better disambiguating shadows from dark water.

Time-dependent hillshades are not currently used in ML for EO use-cases, yet they can be useful. All that is needed to produce them is access to high-resolution bare-earth DEMs, like that of the US National 3D Elevation Program covering the entire continental U.S at 1-meter resolution, or creation of DSMs from the lidar point cloud data itself. As the coverage of DSM and/or DEM products expands to more parts of the world, time-dependent hillshades could become as commonplace as cloud masks in EO use cases.

[1] Dolores Garcia, Gonzalo Mateo-Garcia, Hannes Bernhardt, Ron Hagensieker, Ignacio G. Lopez-Francos, Jonathan Stock, Guy Schumann, Kevin Dobbs and Freddie Kalaitzis Pix2Streams: Dynamic Hydrology Maps from Satellite-LiDAR Fusion. AI for Earth Sciences Workshop, NeurIPS 2020

How to cite: Kalaitzis, F., Mateo-Garcia, G., Dobbs, K., Garcia, D., Stoker, J., and Marchisio, G.: Time-dependent Hillshades: Dispelling the Shadow Curse of Machine Learning Applications in Earth Observation, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8852,, 2022.

EGU22-9348 | Presentations | ITS2.6/AS5.1

Data-driven modelling of soil moisture: mapping organic soils 

Doran Khamis, Matt Fry, Hollie Cooper, Ross Morrison, and Eleanor Blyth

Improving our understanding of soil moisture and hydraulics is crucial for flood prediction, smart agriculture, modelling nutrient and pollutant spread and evaluating the role of land as a sink or source of carbon and other greenhouse gases. State of the art land surface models rely on poorly-resolved soil textural information to parametrise arbitrarily layered soil models; soils rich in organic matter – key to understanding the role of the land in achieving net zero carbon – are not well modelled. Here, we build a predictive data-driven model of soil moisture using a neural network composed of transformer layers to process time series data from point-sensors (precipitation gauges and sensor-derived estimates of potential evaporation) and convolutional layers to process spatial atmospheric driving data and contextual information (topography, land cover and use, location and catchment behaviour of water bodies). We train the model using data from the COSMOS-UK sensor network and soil moisture satellite products and compare the outputs with JULES to investigate where and why the models diverge. Finally, we predict regions of high peat content and propose a way to combine theory with our data-driven approach to move beyond the sand-silt-clay modelling framework.

How to cite: Khamis, D., Fry, M., Cooper, H., Morrison, R., and Blyth, E.: Data-driven modelling of soil moisture: mapping organic soils, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9348,, 2022.

EGU22-9452 | Presentations | ITS2.6/AS5.1

Eddy identification from along track altimeter data using deep learning: EDDY project 

Adili Abulaitijiang, Eike Bolmer, Ribana Roscher, Jürgen Kusche, Luciana Fenoglio, and Sophie Stolzenberger

Eddies are circular rotating water masses, which are usually generated near the large ocean currents, e.g., Gulf Stream. Monitoring eddies and gaining knowledge on eddy statistics over a large region are important for fishery, marine biology studies, and testing ocean models.

At mesoscale, eddies are observed in radar altimetry, and methods have been developed to identify, track and classify them in gridded maps of sea surface height derived from multi-mission data sets. However, this procedure has drawbacks since much information is lost in the gridded maps. Inevitably, the spatial and temporal resolution of the original altimetry data degrades during the gridding process. On the other hand, the task of identifying eddies has been a post-analysis process on the gridded dataset, which is, by far, not meaningful for near-real time applications or forecasts. In the EDDY project at the University of Bonn, we aim to develop methods for identifying eddies directly from along track altimetry data via a machine (deep) learning approach.

At the early stage of the project, we started with gridded altimetry maps to set up and test the machine learning algorithm. The gridded datasets are not limited to multi-mission gridded maps from AVISO, but also include the high resolution (~6 km) ocean modeling simulation dataset (e.g., FESOM, Finite Element Sea ice Ocean Model). Later, the gridded maps are sampled along the real altimetry ground tracks to obtain the single-track altimetry data. Reference data, as the training set for machine learning, will be produced by open-source geometry-based approach (e.g., py-eddy-tracker, Mason et al., 2014) with additional constraints like Okubo-Weiss parameter and Sea Surface Temperature (SST) profile signatures.

In this presentation, we introduce the EDDY project and show the results from the machine learning approach based on gridded datasets for the Gulf stream area for the period 2017, and first results of single-track eddy identification in the region.

How to cite: Abulaitijiang, A., Bolmer, E., Roscher, R., Kusche, J., Fenoglio, L., and Stolzenberger, S.: Eddy identification from along track altimeter data using deep learning: EDDY project, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9452,, 2022.

DINCAE (Data INterpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data (e.g. obscured by clouds or gaps between tracks) in satellite data. Contrary to standard image reconstruction (in-painting) with neural networks, this application requires a method to handle missing data (or data with variable accuracy) already in the training phase. Instead of using a cost function based on the mean square error, the neural network (U-Net type of network) is optimized by minimizing the negative log likelihood assuming a Gaussian distribution (characterized by a mean and a variance). As a consequence, the neural network also provides an expected error variance of the reconstructed field (per pixel and per time instance).


In this updated version DINCAE 2.0, the code was rewritten in Julia and a new type of skip connection has been implemented which showed superior performance with respect to the previous version. The method has also been extended to handle multivariate data (an example will be shown with sea-surface temperature, chlorophyll concentration and wind fields). The improvement of this network is demonstrated in the Adriatic Sea. 


Convolutional networks work usually with gridded data as input. This is however a limitation for some data types used in oceanography and in Earth Sciences in general, where observations are often irregularly sampled.  The first layer of the neural network and the cost function have been modified so that unstructured data can also be used as inputs to obtain gridded fields as output. To demonstrate this, the neural network is applied to along-track altimetry data in the Mediterranean Sea. Results from a 20-year reconstruction are presented and validated. Hyperparameters are determined using Bayesian optimization and minimizing the error relative to a development dataset.

How to cite: Barth, A., Alvera-Azcárate, A., Troupin, C., and Beckers, J.-M.: A multivariate convolutional autoencoder to reconstruct satellite data with an error estimate based on non-gridded observations: application to sea surface height, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9578,, 2022.

EGU22-9734 | Presentations | ITS2.6/AS5.1

High Impact Weather Forecasts in Southern Brazil using Ensemble Precipitation Forecasts and Machine Learning 

Cesar Beneti, Jaqueline Silveira, Leonardo Calvetti, Rafael Inouye, Lissette Guzman, Gustavo Razera, and Sheila Paz

In South America, southern parts of Brazil, Paraguay and northeast Argentina are regions particularly prone to high impact weather (intensive lightning activity, high precipitation, hail, flash floods and occasional tornadoes), mostly associated with extra-tropical cyclones, frontal systems and Mesoscale Convective Systems. In the south of Brazil, agricultural industry and electrical power generation are the main economic activities. This region is responsible for 35% of all hydro-power energy production in the country, with long transmission lines to the main consumer regions, which are severely affected by these extreme weather conditions. Intense precipitation events are a common cause of electricity outages in southern Brazil, which ranks as one of the regions in Brazil with the highest annual lightning incidence, as well. Accurate precipitation forecasts can mitigate this kind of problem. Despite improvements in the precipitation estimates and forecasts, some difficulties remain to increase the accuracy, mainly related to the temporal and spatial location of the events. Although several options are available, it is difficult to identify which deterministic forecast is the best or the most reliable forecast. Probabilistic products from large ensemble prediction systems provide a guide to forecasters on how confident they should be about the deterministic forecast, and one approach is using post processing methods such as machine learning (ML), which has been used to identify patterns in historical data to correct for systematic ensemble biases.

In this paper, we present a study, in which we used 20 members from the Global Ensemble Forecast System (GEFS) and 50 members from European Centre for Medium-Range Weather Forecasts (ECMWF)  during 2019-2021,  for seven daily precipitation thresholds: 0-1.0mm, 1.0mm-15mm, 15mm-40mm, 40mm-55mm, 55mm-105mm, 105mm-155mm and over 155mm. A ML algorithm was developed for each day, up to 15 days of forecasts, and several skill scores were calculated, for these daily precipitation thresholds. Initially, to select the best members of the ensembles, a gradient boosting algorithm was applied, in order to improve the skill of the model and reduce processing time. After preprocessing the data, a random forest classifier was used to train the model. Based on hyperparameter sensitivity tests, the random forest required 500 trees, a maximum tree depth of 12 levels, at least 20 samples per leaf node, and the minimization of entropy for splits. In order to evaluate the models, we used a cross-validation on a limited data sample. The procedure has a single parameter that refers to the number of groups that a given data sample is to be split into. In our work we created a twenty-six fold cross validation with 30 days per fold to verify the forecasts. The results obtained by the RF were evaluated through estimated value versus observed value. For the forecast range, we found values above 75% for the precision metrics in the first 3 days, and around 68% in the next days. The recall was also around 80% throughout the entire forecast range,  with promising results to apply this technique operationally, which is our intent in the near future. 

How to cite: Beneti, C., Silveira, J., Calvetti, L., Inouye, R., Guzman, L., Razera, G., and Paz, S.: High Impact Weather Forecasts in Southern Brazil using Ensemble Precipitation Forecasts and Machine Learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9734,, 2022.

EGU22-9833 | Presentations | ITS2.6/AS5.1

Deep learning for laboratory earthquake prediction and autoregressive forecasting of fault zone stress 

Laura Laurenti, Elisa Tinti, Fabio Galasso, Luca Franco, and Chris Marone

Earthquakes forecasting and prediction have long, and in some cases sordid, histories but recent work has rekindled interest in this area based on advances in short-term early warning, hazard assessment for human induced seismicity and successful prediction of laboratory earthquakes.

In the lab, frictional stick-slip events provide an analog for the full seismic cycle and such experiments have played a central role in understanding the onset of failure and the dynamics of earthquake rupture. Lab earthquakes are also ideal targets for machine learning (ML) techniques because they can be produced in long sequences under a wide range of controlled conditions. Indeed, recent work shows that labquakes can be predicted from fault zone acoustic emissions (AE). Here, we generalize these results and explore additional ML and deep learning (DL) methods for labquake prediction. Key questions include whether improved ML/DL methods can outperform existing models, including prediction based on limited training, or if such methods can successfully forecast beyond a single seismic cycle for aperiodic failure. We describe significant improvements to existing methods of labquake prediction using simple AE statistics (variance) and DL models such as Long-Short Term Memory (LSTM) and Convolution Neural Network (CNN). We demonstrate: 1) that LSTMs and CNNs predict labquakes under a variety of conditions, including pre-seismic creep, aperiodic events and alternating slow and fast events and 2) that fault zone stress can be predicted with fidelity (accuracy in terms of R2 > 0.92), confirming that acoustic energy is a fingerprint of the fault zone stress. We predict also time to start of failure (TTsF) and time to the end of Failure (TTeF). Interestingly, TTeF is successfully predicted in all seismic cycles, while the TTsF prediction varies with the amount of fault creep before an event. We also report on a novel autoregressive forecasting method to predict future fault zone states, focusing on shear stress. This forecasting model is distinct from existing predictive models, which predict only the current state. We compare three modern approaches in sequence modeling framework: LSTM, Temporal Convolution Network (TCN) and Transformer Network (TF). Results are encouraging in forecasting the shear stress at long-term future horizons, autoregressively. Our ML/DL prediction models outperform the state of the art and our autoregressive model represents a novel forecasting framework that could enhance current methods of earthquake forecasting.

How to cite: Laurenti, L., Tinti, E., Galasso, F., Franco, L., and Marone, C.: Deep learning for laboratory earthquake prediction and autoregressive forecasting of fault zone stress, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9833,, 2022.

EGU22-10157 | Presentations | ITS2.6/AS5.1

How land cover changes affect ecosystem productivity 

Andreas Krause, Phillip Papastefanou, Konstantin Gregor, Lucia Layritz, Christian S. Zang, Allan Buras, Xing Li, Jingfeng Xiao, and Anja Rammig

Historically, many forests worldwide were cut down and replaced by agriculture. While this substantially reduced terrestrial carbon storage, the impacts of land-use change on ecosystem productivity have not been adequately resolved yet.

Here, we apply the machine learning algorithm Random Forests to predict the potential gross primary productivity (GPP) of forests, grasslands, and croplands around the globe using high-resolution datasets of satellite-derived GPP, land cover, and 20 environmental predictor variables.

With a mean potential GPP of around 2.0 kg C m-2 yr-1 forests are the most productive land cover on two thirds of the global suitable area, while grasslands and croplands are on average 23 and 9% less productive, respectively. These findings are robust against alternative input datasets and algorithms, even though results are somewhat sensitive to the underlying land cover map.

Combining our potential GPP maps with a land-use reconstruction from the Land-Use Harmonization project (LUH2) we estimate that historical agricultural expansion reduced global GPP by around 6.3 Gt C yr-1 (4.4%). This reduction in GPP induced by land cover changes is amplified in some future scenarios as a result of ongoing deforestation but partly reversed in other scenarios due to agricultural abandonment.

Finally, we compare our potential GPP maps to simulations from eight CMIP6 Earth System Models with an explicit representation of land management. While the mean GPP values of the ESM ensemble show reasonable agreement with our estimates, individual Earth System Models simulate large deviations both in terms of mean GPP values of different land cover types as well as in their spatial variations. Reducing these model biases would lead to more reliable simulations concerning the potential of land-based mitigation policies.

How to cite: Krause, A., Papastefanou, P., Gregor, K., Layritz, L., Zang, C. S., Buras, A., Li, X., Xiao, J., and Rammig, A.: How land cover changes affect ecosystem productivity, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10157,, 2022.

EGU22-10519 | Presentations | ITS2.6/AS5.1 | Highlight

Adaptive Bias Correction for Improved Subseasonal Forecasting 

Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, and Lester Mackey

Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and government agencies from the local to the national level. In fact, weather forecasts 0-10 days ahead and climate forecasts seasons to decades ahead are currently used operationally in decision-making, and the accuracy and reliability of these forecasts has improved consistently in recent decades. However, many critical applications require subseasonal forecasts with lead times in between these two timescales. Subseasonal forecasting—predicting temperature and precipitation 2-6 weeks ahead—is indeed critical for effective water allocation, wildfire management, and drought and flood mitigation. Yet, accurate forecasts for the subseasonal regime are still lacking due to the chaotic nature of weather.

While short-term forecasting accuracy is largely sustained by physics-based dynamical models, these deterministic methods have limited subseasonal accuracy due to chaos. Indeed, subseasonal forecasting has long been considered a “predictability desert” due to its complex dependence on both local weather and global climate variables. Nevertheless, recent large-scale research efforts have advanced the subseasonal capabilities of operational physics-based models, while parallel efforts have demonstrated the value of machine learning and deep learning methods in improving subseasonal forecasting.

To counter the systematic errors of dynamical models at longer lead times, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. We evaluate our adaptive bias correction method in the contiguous U.S. over the years 2011-2020 and demonstrate consistent improvement over standard meteorological baselines, state-of-the-art learning models, and the leading subseasonal dynamical models, as measured by root mean squared error and uncentered anomaly correlation skill. When applied to the United States’ operational climate forecast system (CFSv2), ABC improves temperature forecasting skill by 20-47% and precipitation forecasting skill by 200-350%. When applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 8-38% and precipitation forecasting skill by 40-80%.

Overall, we find that de-biasing dynamical forecasts with our learned adaptive bias correction method yields an effective and computationally inexpensive strategy for generating improved subseasonal forecasts and building the next generation of subseasonal forecasting benchmarks. To facilitate future subseasonal benchmarking and development, we release our model code through the subseasonal_toolkit Python package and our routinely updated SubseasonalClimateUSA dataset through the subseasonal_data Python package.

How to cite: Mouatadid, S., Orenstein, P., Flaspohler, G., Oprescu, M., Cohen, J., Wang, F., Knight, S., Geogdzhayeva, M., Levang, S., Fraenkel, E., and Mackey, L.: Adaptive Bias Correction for Improved Subseasonal Forecasting, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10519,, 2022.

EGU22-10711 | Presentations | ITS2.6/AS5.1

A new approach toward integrated inversion of reflection seismic and gravity datasets using deep learning 

Mahtab Rashidifard, Jeremie Giraud, Mark Jessell, and Mark Lindsay

Reflection seismic data, although sparsely distributed due to the high cost of acquisition, is the only type of data that can provide high-resolution images of the crust to reveal deep subsurface structures and the architectural complexity that may vector attention to minerally prospective regions. However, these datasets are not commonly considered in integrated geophysical inversion approaches due to computationally expensive forward modeling and inversion. Common inversion techniques on reflection seismic images are mostly utilized and developed for basin studies and have very limited application for hard-rock studies. Post-stack acoustic impedance inversions, for example, rely a lot on extracted petrophysical information along drilling borehole for depth correction purposes which are not necessarily available. Furthermore, the available techniques do not allow simple, automatic integration of seismic inversion with other geophysical datasets. 


 We introduce a new methodology that allows the utilization of the seismic images within the gravity inversion technique with the purpose of 3D boundary parametrization of the subsurface. The proposed workflow is a novel approach for incorporating seismic images into the integrated inversion techniques which relies on the image-ray method for depth-to-time domain conversion of seismic datasets. This algorithm uses a convolutional neural network to iterate over seismic images in time and depth domains. This iterative process is functional to compensate for the low depth resolution of the gravity datasets. We use a generalized level-set technique for gravity inversion to link the interfaces of the units with the depth-converted seismic images. The algorithm has been tested on realistic synthetic datasets generated from scenarios corresponding to different deformation histories. The preliminary results of this study suggest that post-stack seismic images can be utilized in integrated geophysical inversion algorithms without the need to run computationally expensive full wave-form inversions.  

How to cite: Rashidifard, M., Giraud, J., Jessell, M., and Lindsay, M.: A new approach toward integrated inversion of reflection seismic and gravity datasets using deep learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10711,, 2022.

EGU22-11043 | Presentations | ITS2.6/AS5.1

Framework for the deployment of DNNs in remote sensing inversion algorithms applied to Copernicus Sentinel-4 (S4) and TROPOMI/Sentinel-5 Precursor (S5P) 

Fabian Romahn, Victor Molina Garcia, Ana del Aguila, Ronny Lutz, and Diego Loyola

In remote sensing, the quantities of interest (e.g. the composition of the atmosphere) are usually not directly observable but can only be inferred indirectly via the measured spectra. To solve these inverse problems, retrieval algorithms are applied that usually depend on complex physical models, so-called radiative transfer models (RTMs). RTMs are very accurate, however also computationally very expensive and therefore often not feasible in combination with the strict time requirements of operational processing of satellite measurements. With the advances in machine learning, the methods of this field, especially deep neural networks (DNN), have become very promising for accelerating and improving the classical remote sensing retrieval algorithms. However, their application is not straightforward but instead quite challenging as there are many aspects to consider and parameters to optimize in order to achieve satisfying results.

In this presentation we show a general framework for replacing the RTM, used in an inversion algorithm, with a DNN that offers sufficient accuracy while at the same time increases the processing performance by several orders of magnitude. The different steps, sampling and generation of the training data, the selection of the DNN hyperparameters, the training and finally the integration of the DNN into an operational environment are explained in detail. We will also focus on optimizing the efficiency of each step: optimizing the generation of training samples through smart sampling techniques, accelerating the training data generation through parallelization and other optimizations of the RTM, application of tools for the DNN hyperparameter optimization as well as the use of automation tools (source code generation) and appropriate interfaces for the efficient integration in operational processing systems.

This procedure has been continuously developed throughout the last years and as a use case, it will be shown how it has been applied in the operational retrieval of cloud properties for the Copernicus satellite sensors Sentinel-4 (S4) and TROPOMI/Sentinel-5 Precursor (S5P).

How to cite: Romahn, F., Molina Garcia, V., del Aguila, A., Lutz, R., and Loyola, D.: Framework for the deployment of DNNs in remote sensing inversion algorithms applied to Copernicus Sentinel-4 (S4) and TROPOMI/Sentinel-5 Precursor (S5P), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11043,, 2022.

EGU22-11420 | Presentations | ITS2.6/AS5.1

Histroy Matching for the tuning of coupled models: experiments on the Lorenz 96 model 

Redouane Lguensat, Julie Deshayes, and Venkatramani Balaji

The process of relying on experience and intuition to find good sets of parameters, commonly referred to as "parameter tuning" keeps having a central role in the roadmaps followed by dozens of modeling groups involved in community efforts such as the Coupled Model Intercomparison Project (CMIP). 

In this work, we study a tool from the Uncertainty Quantification community that started recently to draw attention in climate modeling: History Matching also referred to as "Iterative Refocussing". The core idea of History Matching is to run several simulations with different set of parameters and then use observed data to rule-out any parameter settings which are "implausible". Since climate simulation models are computationally heavy and do not allow testing every possible parameter setting, we employ an emulator that can be a cheap and accurate replacement. Here a machine learning algorithm, namely, Gaussian Process Regression is used for the emulating step. History Matching is then a good example where the recent advances in machine learning can be of high interest to climate modeling.

One objective of this study is to evaluate the potential for history matching to tune a climate system with multi-scale dynamics. By using a toy climate model, namely, the Lorenz 96 model, and producing experiments in perfect-model setting, we explore different types of applications of HM and highlight the strenghts and challenges of using such a technique. 

How to cite: Lguensat, R., Deshayes, J., and Balaji, V.: Histroy Matching for the tuning of coupled models: experiments on the Lorenz 96 model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11420,, 2022.

EGU22-11465 | Presentations | ITS2.6/AS5.1

Quantile machine learning models for predicting European-wide, high resolution fine-mode Aerosol Optical Depth (AOD) based on ground-based AERONET and satellite AOD data 

Zhao-Yue Chen, Raul Méndez-Turrubiates, Hervé Petetin, Aleks Lacima, Albert Soret Miravet, Carlos Pérez García-Pando, and Joan Ballester

Air pollution is a major environmental risk factor for human health. Among the different air pollutants, Particulate Matter (PM) arises as the most prominent one, with increasing health effects over the last decades. According to the Global Burden of Disease, PM contributed to 4.14 million premature deaths globally in 2019, over twice as much as in 1990 (2.04 million). With these numbers in mind, the assessment of ambient PM exposure becomes a key issue in environmental epidemiology. However, the limited number of ground-level sites measuring daily PM values is a major constraint for the development of large-scale, high-resolution epidemiological studies.

In the last five years, there has been a growing number of initiatives estimating ground-level PM concentrations based on satellite Aerosol Optical Depth (AOD) data, representing a low-cost alternative with higher spatial coverage compared to ground-level measurements. At present, the most popular AOD product is NASA’s MODIS (Moderate Resolution Imaging Spectroradiometer), but the data that it provides is restricted to Total Aerosol Optical Depth (TAOD). Compared with TAOD, Fine-mode Aerosol Optical Depth (FAOD) better describes the distribution of small-diameter particles (e.g. PM10 and PM2.5), which are generally those associated with anthropogenic activity. Complementarily, AERONET (AErosol RObotic NETwork, which is the network of ground-based sun photometers), additionally provide Fine- and Coarse-mode Aerosol Optical Depth (FAOD and CAOD) products based on Spectral Deconvolution Algorithms (SDA).

Within the framework of the ERC project EARLY-ADAPT (, which aims to disentangle the association between human health, climate variability and air pollution to better estimate the early adaptation response to climate change, here we develop quantile machine learning models to further advance in the association between AERONET FAOD and satellite AOD over Europe during the last two decades. Due to large missing data form satellite estimations, we also included the AOD estimates from ECMWF’s Copernicus Atmosphere Monitoring Service Global Reanalysis (CAMSRA) and NASA’s Modern-Era Retrospective Analysis for Research and Applications v2 (MERRA-2), together with atmosphere, land and ocean variables such as boundary layer height, downward UV radiation and cloud cover from ECMWF’s ERA5-Land.

The models were thoroughly validated with spatial cross-validation. Preliminary results show that the R2 of the three AOD estimates (TAOD, FAOD and CAOD) predicted with quantile machine learning models range between 0.61 and 0.78, and the RMSE between 0.02 and 0.03. For the Pearson correlation with ground-level PM2.5, the predicted FAOD is highest (0.38), while 0.18, 0.11 and 0.09 are for Satellite, MERRA-2, CAMSRA AOD, respectively. This study provides three useful indicators for further estimating PM, which could improve our understanding of air pollution in Europe and open new avenues for large-scale, high-resolution environmental epidemiology studies.

How to cite: Chen, Z.-Y., Méndez-Turrubiates, R., Petetin, H., Lacima, A., Soret Miravet, A., Pérez García-Pando, C., and Ballester, J.: Quantile machine learning models for predicting European-wide, high resolution fine-mode Aerosol Optical Depth (AOD) based on ground-based AERONET and satellite AOD data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11465,, 2022.

EGU22-11924 | Presentations | ITS2.6/AS5.1

Automated detection and classification of synoptic scale fronts from atmospheric data grids 

Stefan Niebler, Peter Spichtinger, Annette Miltenberger, and Bertil Schmidt

Automatic determination of fronts from atmospheric data is an important task for weather prediction as well as for research of synoptic scale phenomena. We developed a deep neural network to detect and classify fronts from multi-level ERA5 reanalysis data. Model training and prediction is evaluated using two different regions covering Europe and North America with data from two weather services. Due to a label deformation step performed during training we are able to directly generate frontal lines with no further thinning during post processing. Our network compares well against the weather service labels with a Critical Success Index higher than 66.9% and a Object Detection Rate of more than 77.3%. Additionally the frontal climatologies generated from our networks ouput are highly correlated (greater than 77.2%) to climatologies created from weather service data. Evaluation of cross sections of our detection results provide further insight in the characteristics of our predicted fronts and show that our networks classification is physically plausible.

How to cite: Niebler, S., Spichtinger, P., Miltenberger, A., and Schmidt, B.: Automated detection and classification of synoptic scale fronts from atmospheric data grids, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11924,, 2022.

EGU22-12043 | Presentations | ITS2.6/AS5.1

A Domain-Change Approach to the Semantic Labelling of Remote Sensing Images 

Chandrabali Karmakar, Gottfried Schwartz, Corneliu Octavian Dumitru, and Mihai Datcu

For many years, image classification – mainly based on pixel brightness statistics – has been among the most popular remote sensing applications. However, during recent years, many users were more and more interested in the application-oriented semantic labelling of remotely sensed image objects being depicted in given images.

In parallel, the development of deep learning algorithms has led to several powerful image classification and annotation tools that became popular in the remote sensing community. In most cases, these publicly available tools combine efficient algorithms with expert knowledge and/or external information ingested during an initia