ESSI – Earth & Space Science Informatics

ESSI1.2 – Spatio-temporal Data Science: Theoretical Advances and Applications in AI and ML

The real world does not live on a regular grid. The observations with the best spatiotemporal resolution are generally irregularly distributed over space and time, even though as data they are generally stored in arrays in files. Storing the diverse data types of Earth science, including grid, swath, and point based spatiotemporal distributions, in separate files leads to computer-native array layouts on disk or working memory having little or no connection with the spatiotemporal layout of the observations themselves. For integrative analysis, data must be co-aligned both spatiotemporally and in computer memory, a process called data harmonization. For data harmonization to be scalable in both diversity and volume, data movement must be minimized. The SpatioTemporal Adaptive Resolution Encoding (STARE) is a hierarchical, recursively subdivided indexing scheme for harmonizing diverse data at scale. 

STARE indices are integers embedded with spatiotemporal attributes key to efficient spatiotemporal analysis. As a more computationally efficient alternative to conventional floating-point spatiotemporal references, STARE indices apply uniformly to all spatiotemporal data regardless of their geometric layouts. Through this unified reference, STARE harmonizes diverse data in their native states to enable integrative analysis without requiring homogenization of the data by interpolating them to a common grid first.

The current implementation of STARE supports solid angle indexing, i.e. longitude-latitude, and time. To fully support Earth science applications, STARE must be extended to indexing the radial dimension for a full 4D spatiotemporal indexing. As STARE’s scalability is based on having a universal encoding scheme mapping spatiotemporal volumes to integers, the variety of existing approaches to encoding the radial dimension arising in Earth science raises complex design issues for applying STARE’s principles. For example, the radial dimension can be usefully expressed via length (altitude) or pressure coordinates. Both length and pressure raise the question as to what reference surface should be used. As STARE’s goal is to harmonize different kinds of data, we must determine whether it is better to have separate radial scale encodings for length and pressure, or should we have a single radial encoding, for which we provide tools for translating between various (radial) coordinate systems. The questions become more complex when we consider the wide range of Earth science data and applications, including, for example, model simulation output, lidar point clouds, spacecraft swath data, aircraft in-situ measurements, vertical or oblique parameter retrievals, and earthquake-induced movement detection. 

In this work, we will review STARE’s unifying principle and the unique nature of the radial dimension. We will discuss the challenges of enabling scalable Earth science data harmonization in both diversity and volume, particularly in the context of detection, cataloging, and statistical study of fully 4D hierarchical phenomena events such as extratropical cyclones. With the twin challenges of exascale computing and increasing model simulation resolutions opening new views into physical processes, scalable methods for bringing best-resolution observations and simulations together, like STARE, are becoming increasingly important.

How to cite: Rilee, M. and Kuo, K.-S.: Design Considerations for the 3rd Spatial Dimension of the Spatiotemporal Adaptive Resolution Encoding (STARE) , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10799, https://doi.org/10.5194/egusphere-egu22-10799, 2022.

EGU22-1346 | Presentations | ESSI1.2

Enhance pluvial flood risk assessment using spatio-temporal machine learning models

Andrea Critto, Marco Zanetti, Elena Allegri, Anna Sperotto, and Silvia Torresan

Extreme weather events (e.g., heavy rainfall) are natural hazards that pose increasing threats to many sectors and across sub-regions worldwide (IPCC, 2014), exposing people and assets to damaging effects. In order to predict pluvial flood risks under different spatio-temporal conditions, three generalized Machine Learning models were developed and applied to the Metropolitan City of Venice: Logistic Regression, Neural Networks and Random Forest. The models considered 60 historical pluvial flood events, occurred in the timeframe 1995-2020. The historical events helped to identify and prioritize sub-areas that are more likely to be affected by pluvial flood risk due to heavy precipitation. In addition, while developing the model, 13 triggering factors have been selected and assessed: aspect, curvature, distance to river, distance to road, distance to sea, elevation, land use, NDVI, permeability, precipitation, slope, soil and texture. A forward features selection method was applied to understand which features better face spatio-temporal overfitting in pluvial flood prediction based on AUC score. Results of the analysis showed that the most accurate models were obtained with the Logistic Regression approach, which was used to provide pluvial flood risk maps for each of the 60 major historical events occurred in the case study area. The model showed high accuracy and most of the occured events in the Metropolitan City of Venice have been properly predicted, demostrating that Machine Learning could substantially improve and speed up disaster risk assessment and mapping helping in overcoming most common bottlenecks of physically-based simulations such as the computational complexity and the need of large datasets of high-resolution information.

How to cite: Critto, A., Zanetti, M., Allegri, E., Sperotto, A., and Torresan, S.: Enhance pluvial flood risk assessment using spatio-temporal machine learning models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1346, https://doi.org/10.5194/egusphere-egu22-1346, 2022.

EGU22-10823 | Presentations | ESSI1.2

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets

Arjun Nellikkattil, June-Yi Lee, and Axel Timmermann

The study describes a generalized framework to extract and track features from large climate datasets. Unlike other feature extraction algorithms, Scalable Feature Extraction and Tracking (SCAFET) is independent of any physical thresholds making it more suitable for comparing features from different datasets. Features of interest are extracted by segmenting the data on the basis of a scale-independent bounded variable called shape index (Si). Si gives a quantitative measurement of the local shape of the field with respect to its surroundings. To illustrate the capabilities of the method, we have employed it in the extraction of different types of features. Cyclones and atmospheric rivers are extracted from the ERA5 reanalysis dataset to show how the algorithm extracts points as well as surfaces from climate datasets. Extraction of sea surface temperature fronts depicts how SCAFET handles unstructured grids. Lastly, the 3D structures of jetstreams is extracted to demonstrate that the algorithm can extract 3D features too. The detection algorithm is implemented as a jupyter notebook[https://colab.research.google.com/drive/1D0rWNQZrIfLEmeUYshzqyqiR7QNS0Hm-?usp=sharing] accessible to anyone to test out the algorithm.

How to cite: Nellikkattil, A., Lee, J.-Y., and Timmermann, A.: Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10823, https://doi.org/10.5194/egusphere-egu22-10823, 2022.

EGU22-3855 | Presentations | ESSI1.2

CGC: an open-source Python module for geospatial data clustering

Ou Ku, Francesco Nattino, Meiert Grootes, Emma Izquierdo-Verdiguier, Serkan Girgin, and Raul Zurita-Milla

With the growing ubiquity of large multi-dimensional geodata cubes, clustering techniques have become essential to extracting patterns and creating insights from data cubes. Aiming to meet this increasing need, we present Clustering Geodata Cubes (CGC): an open-source Python package designed for partitional clustering of geospatial data. CGC provides efficient clustering methods to identify groups of similar data. In contrast to traditional techniques, which act on a single dimension, CGC is able to perform both co-clustering (clustering across two dimensions e.g., spatial and temporal) and tri-clustering (clustering across three dimensions e.g., spatial, temporal, and thematic), as well as of subsequently refining the identified clusters. CGC also entails scalable approaches that suit both small and big datasets. It can be efficiently deployed on a range of computational infrastructures, from single machines to computing clusters. As a case study, we present an analysis of spring onset indicator datasets at continental scale.

How to cite: Ku, O., Nattino, F., Grootes, M., Izquierdo-Verdiguier, E., Girgin, S., and Zurita-Milla, R.: CGC: an open-source Python module for geospatial data clustering, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3855, https://doi.org/10.5194/egusphere-egu22-3855, 2022.

EGU22-3940 | Presentations | ESSI1.2

The Analysis of the Aftershock Sequence of the Recent Mainshock in Arkalochori, Crete Island Greece

Alexandra Moshou, Antonios Konstantaras, and Panagiotis Argyrakis

Forecasting the evolution of natural hazards is a critical problem in natural sciences. Earthquake forecasting is one such example and is a difficult task due to the complexity of the occurrence of earthquakes. Until today, earthquake prediction is based on the time before the occurrence of the main earthquake and is based mainly on empirical methods and specifically on the seismic history of a given area. Τhe analysis and processing of its seismicity play a critical role in modern statistical seismology. In this work, a first attempt is made to study and draw safe conclusions regarding the prediction for the seismic sequence, specifically using appropriate statistical methods like Bayesian predictive, taking into account the uncertainties of the model parameters. The above theory was applied in the recent seismic sequence in the area of ​​Arkalochori in Crete Island, Greece (2021, Mw 6.0). Τhe rich seismic sequence that took place immediately after the main 5.6R earthquake with a total of events for the next three months, approximately 4,000 events of magnitude ML > 1 allowed calculating the probability of having the most significant expected earthquake during a given time as well as calculating the probability that the most significant aftershock is expected to be above a certain magnitude after a major earthquake.

References:

  • Ganas, A., Fassoulas, C., Moshou, A., Bozionelos, G., Papathanassiou, G., Tsimi, C., & Valkaniotis, S. (2017). Geological and seismological evidence for NW-SE crustal extension at the southern margin of Heraklion basin, Crete. Bulletin of the Geological Society of Greece, 51, 52-75. doi: https://doi.org/10.12681/bgsg.15004
  • Konstantaras, A.J. (2016). Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters. Earth Science Informatics. 9 (1), 95-100.
  • Konstantaras, A. (2020). Deep learning and parallel processing spatio-temporal clustering unveil new Ionian distinct seismic zone. Informatics. 7 (4), 39.
  • Moshou, A., Papadimitriou, E., Drakatos, G., Evangelidis, C., Karakostas, V., Vallianatos, F., & Makropoulos, K. (2014, May). Focal Mechanisms at the convergent plate boundary in Southern Aegean, Greece. In EGU General Assembly Conference Abstracts (p. 12185)
  • Moshou, A., Argyrakis, P., Konstantaras, A., Daverona, A.C. & Sagias, N.C. (2021). Characteristics of Recent Aftershocks Sequences (2014, 2015, 2018) Derived from New Seismological and Geodetic Data on the Ionian Islands, Greece. 6 (2), 8.
  • C.B., Nolet. G., 1997. P and S velocity structure of the Hellenic area obtained by robust nonlinear inversion of travel times. J. Geophys. Res. 102 (8). 349–367

How to cite: Moshou, A., Konstantaras, A., and Argyrakis, P.: The Analysis of the Aftershock Sequence of the Recent Mainshock in Arkalochori, Crete Island Greece, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3940, https://doi.org/10.5194/egusphere-egu22-3940, 2022.

EGU22-5487 | Presentations | ESSI1.2

3D Mapping of Active Underground Faults Enabled by Heterogeneous Parallel Processing Spatio-Temporal Proximity and Clustering Algorithms

Alexandra Moshou, Antonios Konstantaras, Nikitas Menounos, and Panagiotis Argyrakis

Underground faults cast energy storage elements of the accumulated strain energy in border areas of active tectonic plates. Particularly in the southern front of the Hellenic seismic arc, a steady yearly flow in the accumulation of strain energy is being due to the constant rate of motion at which the African plate sub-sinks beneath the Eurasian plate. Partial release of the stored energy from a particular underground fold manifests in the form of an earthquake once reaching the surface of the Earth’s crust. The information obtained for each recorded earthquake includes among others the surface location and the estimated hypocentre depth. Considering that hundreds of thousands earthquakes have been recorded in that particular area, the accumulated hypocentre depths provide a most valuable source of information regarding the in-depth extent of the seismically active parts of the underground faults. This research work applies expert knowledge spatio-temporal clustering in previously reported distinct seismic cluster zones, aiming to associate each individual main earthquake along with its recoded foreshocks and aftershocks to a single underground fault in existing two-dimensional mappings. This process is being enabled by heterogeneous parallel processing algorithms encompassing both proximity and agglomerative density-based clustering algorithms upon main seismic events only to mapped. Once a main earthquake is being associated to a particular known underground fault, then the underground fault’s point with maximum proximity to the earthquake’s hypocentre appends its location parameters, additionally incorporating the dimension of depth to the initial planar dimensions of latitude and longitude. The ranges of depth variations provide a notable indication of the in-depth extent of the seismically active part(s) of underground faults enabling their 3D model mapping.

Indexing terms: spatio-temporal proximity and clustering algorithms, heterogeneous parallel processing, Cuda, 3D underground faults’ mapping

References

Axaridou A., I. Chrysakis, C. Georgis, M. Theodoridou, M. Doerr, A. Konstantaras, and E. Maravelakis. 3D-SYSTEK: Recording and exploiting the production workflow of 3D-models in cultural heritage. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 51-56, 2014.

Konstantaras A. Deep learning and parallel processing spatio-temporal clustering unveil new Ionian distinct seismic zone. Informatics. 7 (4), 39, 2020.

Konstantaras A.J. Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters. Earth Science Informatics. 9 (1), 95-100, 2016.

Konstantaras A.J., E. Katsifarakis, E. Maravelakis, E. Skounakis, E. Kokkinos and E. Karapidakis. Intelligent spatial-clustering of seismicity in the vicinity of the Hellenic Seismic Arc. Earth Science Research 1 (2), 1-10, 2012.

Konstantaras A., F. Valianatos, M.R. Varley, J.P. Makris. Soft-Computing modelling of seismicity in the southern Hellenic Arc. IEEE Geoscience and Remote Sensing Letters, 5 (3), 323-327, 2008.

Konstantaras A., M.R. Varley, F. Valianatos, G. Collins and P. Holifield. Recognition of electric earthquake precursors using neuro-fuzzy methods: methodology and simulation results. Proc. IASTED Int. Conf. Signal Processing, Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 303-308, 2002.

Maravelakis E., A. Konstantaras, K. Kabassi, I. Chrysakis, C. Georgis and A. Axaridou. 3DSYSTEK web-based point cloud viewer. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 262-266, 2014.

How to cite: Moshou, A., Konstantaras, A., Menounos, N., and Argyrakis, P.: 3D Mapping of Active Underground Faults Enabled by Heterogeneous Parallel Processing Spatio-Temporal Proximity and Clustering Algorithms, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5487, https://doi.org/10.5194/egusphere-egu22-5487, 2022.

As the interpretability and explainability of artificial intelligence decisions has been gaining attention, novel approaches are needed to develop diagnostic tools that account for the unique challenges of geospatial and environmental data, including spatial dependence and high dimensionality, which are addressed in this contribution. Building upon the geostatistical tradition of distance-based measures, spatial prediction error profiles (SPEPs) and spatial variable importance proles (SVIPs) are introduced as novel model-agnostic assessment and interpretation tools that explore the behavior of models at different prediction horizons. Moreover, to address the challenges of interpreting the joint effects of strongly correlated or high-dimensional features, often found in environmental modeling and remote sensing, a model-agnostic approach is developed that distills aggregated relationships from complex models. The utility of these techniques is demonstrated in two case studies representing a regionalization task in an environmental-science context, and a classification task from multitemporal remote sensing of land use. In these case studies, SPEPs and SVIPs successfully highlight differences and surprising similarities of geostatistical methods, linear models, random forest, and hybrid algorithms. With 64 correlated features in the remote-sensing case study, the transformation-based interpretation approach successfully summarizes high-dimensional relationships in a small number of diagrams.

The novel diagnostic tools enrich the toolkit of geospatial data science, and may improve machine-learning model interpretation, selection, and design.

How to cite: Brenning, A.: Novel approaches to model assessment and interpretation in geospatial machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6955, https://doi.org/10.5194/egusphere-egu22-6955, 2022.

EGU22-8648 | Presentations | ESSI1.2

A graph-based fractality index to characterize complexity of urban form using deep graph convolutional neural networks

Lei Ma, Stefan Seipel, S. Anders Brandt, and Ding Ma

Inspection of the complexity of urban morphology facilitates understanding of human behaviors in urban space, leading to better conditions for the sustainable design of future cities. Fractal indicators, such as fractal dimension, ht-index, and cumulative rate of growth (CRG) index, have been proposed as measures of such complexity. However, these major fractal indicators are statistical rather than spatial, which leads to failure of characterizing the spatial complexity of urban morphology, such as building footprints. To overcome this problem, in this paper a graph-based fractality index (GFI), based on a hybrid of fractal theories and deep learning techniques, is proposed. To quantify the spatial complexity, several fractal variants were synthesized to train a deep graph convolutional neural network. Building footprints of London were used to test the method and the results show that the proposed framework performs better than traditional indices. Moreover, the possibility of bridging fractal theories and deep learning techniques on complexity issues opens up new possibilities of data-driven GIScience.

How to cite: Ma, L., Seipel, S., Brandt, S. A., and Ma, D.: A graph-based fractality index to characterize complexity of urban form using deep graph convolutional neural networks, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8648, https://doi.org/10.5194/egusphere-egu22-8648, 2022.

EGU22-7529 | Presentations | ESSI1.2

Global maps from local data: Towards globally applicable spatial prediction models

Marvin Ludwig, Álvaro Moreno Martínez, Norbert Hölzel, Edzer Pebesma, and Hanna Meyer

Global-scale maps are an important tool to provide ecologically relevant environmental variables to researchers and decision makers. Usually, these maps are created by training a machine learning algorithm on field-sampled reference data and the application of the resulting model to associated information from satellite imagery or globally available environmental predictors. However, field samples are often sparse and clustered in geographic space, representing only parts of the global environment. Machine learning models are therefore prone to overfit to the specific environments they are trained on - especially when a large set of predictor variables is utilized. Consequently, model validations have to include an analysis of the models transferability to regions where no training samples are available e.g. by computing the Area of Applicability (AOA, Meyer and Pebesma 2021) of the prediction models.

Here we reproduce three recently published global environmental maps (soil nematode abundances, potential tree cover and specific leaf area) and assess their AOA. We then present a workflow to increase the AOA (i.e. transferability) of the machine learning models. The workflow utilizes spatial variable selection in order to train generalized models which include only predictors that are most suitable for predictions in regions without training samples. We compared the results to the three original studies in terms of prediction performance and AOA. Results indicate that reducing predictors to those relevant for spatial prediction, leads to a significant increase of model transferability without significant decrease of the prediction quality in areas with high sampling density.

Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution 2041–210X.13650 (2021) doi:10.1111/2041-210X.13650.

How to cite: Ludwig, M., Moreno Martínez, Á., Hölzel, N., Pebesma, E., and Meyer, H.: Global maps from local data: Towards globally applicable spatial prediction models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7529, https://doi.org/10.5194/egusphere-egu22-7529, 2022.

EGU22-8891 | Presentations | ESSI1.2

Infilling Spatial Precipitation Recordings with a Memory-Assisted CNN

Johannes Meuer, Laurens Bouwer, Étienne Plésiat, Roman Lehmann, Markus Hoffmann, Thomas Ludwig, Wolfgang Karl, and Christopher Kadow

Missing climate data is a widespread problem in climate science and leads to uncertainty of prediction models that rely on these data resources. So far, existing approaches for infilling missing precipitation data are mostly numerical or statistical techniques that require considerable computational resources and are not suitable for large regions with missing data. Most recently, there have been several approaches to infill missing climate data with machine learning methods such as convolutional neural networks or generative adversarial networks. They have proven to perform well on infilling missing temperature or satellite data. However, these techniques consider only spatial variability in the data whereas precipitation data is much more variable in both space and time. Rainfall extremes with high amplitudes play an important role. We propose a convolutional inpainting network that additionally considers a memory module. One approach investigates the temporal variability in the missing data regions using a long-short term memory. An attention-based module has also been added to the technology to consider further atmospheric variables provided by reanalysis data. The model was trained and evaluated on the RADOLAN data set  which is based on radar precipitation recordings and weather station measurements. With the method we are able to complete gaps in this high quality, highly resolved spatial precipitation data set over Germany. In conclusion, we compare our approach to statistical techniques for infilling precipitation data as well as other state-of-the-art machine learning techniques. This well-combined technology of computer and atmospheric research components will be presented as a dedicated climate service component and data set.

How to cite: Meuer, J., Bouwer, L., Plésiat, É., Lehmann, R., Hoffmann, M., Ludwig, T., Karl, W., and Kadow, C.: Infilling Spatial Precipitation Recordings with a Memory-Assisted CNN, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8891, https://doi.org/10.5194/egusphere-egu22-8891, 2022.

With the far-reaching impact of Artificial Intelligence (AI) becoming more acknowledgeable across various dimensions and industries, the Geomatics scientific community has reasonably turned to automated (in some cases, autonomous) solutions while looking to efficiently extract and communicate patterns in high-dimensional geographic data. This, in turn, has led to a range of AI platforms providing grounds for cutting-edge technologies such as data mining, image processing and predictive/prescriptive modelling. Meanwhile, coastal management bodies around the world, are striving to harness the power of AI and Machine Learning (ML) applications to act upon the wealth of coastal information, emanating from disparate data sources (e.g., geodesy, hydrography, bathymetry, mapping, remote sensing, and photogrammetry). The cross-disciplinarity of stakeholder engagement calls for thorough risk assessment and coastal defence strategies (e.g., erosion/flooding control), consistent with the emerging need for participatory and integrated policy analyses. This paper addresses the issue of seeking techno-centric solutions in human-understandable language, for holistic knowledge engineering (from acquisition to dissemination) in a spatiotemporal context; namely, the benefits of setting up a unified Visual Analytics (VA) system, which allows for real-time monitoring and Online Analytical Processing (OLAP) operations on-demand, via role-based access. Working from an all-encompassing data model could form seamlessly collaborative workspaces that support multiple programming languages (packaging ML libraries designed to interoperate) and enable heterogeneous user communities to visualize Big Data at different granularities, as well as perform task-specific queries with little, or no, programming skill. The proposed solution is an integrated coastal management dashboard, built natively for the cloud (aka leveraging batch and stream processing), to dynamically host live Key Performance Indicators (KPIs) whilst ensuring wide adoption and sustainable operation. The results reflect the value of effectively collecting and consolidating coastal (meta-)data into open repositories, to jointly produce actionable insight in an efficient manner.

How to cite: Anthis, Z.: Reading Between the (Shore)Lines: Real-Time Analytical Processing to Monitor Coastal Erosion, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13102, https://doi.org/10.5194/egusphere-egu22-13102, 2022.

EGU22-8323 | Presentations | ESSI1.2

Multi-attribute geolocation inference from tweets

Umair Qazi, Ferda Ofli, and Muhammad Imran

Geotagged social media messages, especially from Twitter, can have a substantial impact on decision-making processes during natural hazards and disasters. For example, such geolocation information can be used to enhance natural hazard detection systems where real-time geolocated tweets can help identify the critical human-centric hotspots of an emergency where urgent help is required.

Our work can extract geolocation information from tweets by making use of five meta-data attributes provided by Twitter. Three of these are free-form text, namely tweet text, user profile description, and user location. The other two attributes are GPS coordinates and place tags.

Tweet text may or may not have relevant information to extract geolocation. In the cases where location information is available within tweet text, we follow toponym extraction from the text using Named Entity Recognition and Classification (NERC). The extracted toponyms are then used to obtain geolocation information using Nominatim (which is open-source geocoding software that powers OpenStreetMap) at various levels such as country, state, county, city.

Similar process is followed for user profile description where only location toponyms identified by NERC are stored and then geocoded using Nominatim at various levels.

User location field, which is also a free form text, can have mentions of multiple locations such as USA, UK. To extract location from this field a heuristic algorithm is adopted based on a ranking mechanism that allows it to be resolved to a single point of location which can be then mapped at various levels such as country, state, county, city.

GPS coordinates provide the exact longitude and latitude of the device's location. We perform reverse geocoding to obtain additional location details, e.g., street, city, or country the GPS coordinates belong to. For this purpose, we use Nominatim’s reverse API endpoint to extract city, county, state, and country information.

Place tag provides a bounding box or an exact longitude and latitude or name information of location-tagged by the user. The place field data contains several location attributes. We extract location information from different location attributes within the place using different algorithms. Nominatim’s search API endpoint to extract city, county, state, and country names from the Nominatim response if available.

Our geo-inference pipeline is designed to be used as a plug-in component. The system spans an elasticsearch cluster with six nodes for efficient and fast querying and insertion of records. It has already been tested on geolocating more than two billion covid-related tweets. The system is able to handle high insertion and query load. We have implemented smart caching mechanisms to avoid repetitive Nominatim calls since it is an expensive operation. The caches are available both for free-form text (Nominatim’s search API) and exact latitude and longitude (Nominatim’s reverse API). These caches help reduce the load on Nominatim and give quick access to the most commonly queried terms.

With this effort, we hope to provide the necessary means for researchers and practitioners who intend to explore social media data for geo-applications.

How to cite: Qazi, U., Ofli, F., and Imran, M.: Multi-attribute geolocation inference from tweets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8323, https://doi.org/10.5194/egusphere-egu22-8323, 2022.

EGU22-3131 | Presentations | ESSI1.2

Language model for Earth science for semantic search

Rahul Ramachandran, Muthukumaran Muthukumaran Ramasubramanian, Prasanna Koirala, Iksha Gurung, and Manil Maskey

Recent advances in technology have transformed the Natural Language Technology (NLT) landscape, specifically, the use of transformers to build language models such as BERT and GPT3. Furthermore, it has been shown that the quality and the domain-specificity of input corpus to language models can improve downstream application results. However, Earth science research has minimal efforts focused on building and using a domain-specific language model. 

We utilize a transfer learning solution that uses an existing language model trained for general science (SciBERT) and fine-tune it using abstracts and full text extracted from various Earth science journals to create BERT-E (BERT for Earth Science). The training process utilized the input of 270k+ Earth science articles with almost 6 million paragraphs. We used Masked Language Modeling (MLM) to train the transformer model. MLM works by masking random words in the paragraph and optimizing the model for predicting the right masked word. BERT-E was evaluated by performing a downstream keyword classification task, and the performance was compared against classification results using the original SciBERT Language Model. The SciBERT-based model attained an accuracy of 89.99, whereas the BERT-E-based model attained an accuracy of 92.18, showing an improvement in overall performance.

We investigate employing language models to provide new semantic search capabilities for unstructured text such as papers. This search capability requires utilizing a knowledge graph generated from Earth science corpora with a language model and convolutions to surface latent and related sentences for a natural language query. The sentences in the papers are modeled in the graph as nodes, and these nodes are connected through entities. The language model is used to give sentences a numeric representation. Graph convolutions are then applied to sentence embeddings to obtain a vector representation of the sentence along with combined representation of the  surrounding graph structure. This approach utilizes both the power of adjacency inherently encoded in graph structures and latent knowledge captured in the language model. Our initial proof of concept prototype used SIMCSE training algorithm (and the tinyBERT architecture) as the embedding model. This framework has demonstrated an improved ability to surface relevant, latent information based on the input query. We plan to show new results using the domain-specific BERT-E model.

How to cite: Ramachandran, R., Muthukumaran Ramasubramanian, M., Koirala, P., Gurung, I., and Maskey, M.: Language model for Earth science for semantic search, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3131, https://doi.org/10.5194/egusphere-egu22-3131, 2022.

ESSI1.4 – Novel Methods and Applications of Satellite and Aerial Imagery

EGU22-9012 | Presentations | ESSI1.4 | Highlight

Mapping the World at 10 m: A Novel Deep-Learning Land Use Land Cover Product and Beyond

Dawn Wright, Steve Brumby, Sean Breyer, Abigail Fitzgibbon, Dan Pisut, Zoe Statman-Weil, Mark Hannel, Mark Mathis, and Caitlin Kontgis

Land use / land cover (LULC) maps provide critical information to governments, land use planners, and decision-makers about the spatial layout of the environment and how it is changing.  While a variety of LULC products exist, they are often coarse in resolution, not updated regularly, or require manual editing to be useful.  In partnership, Esri, Microsoft Planetary Computer, and Impact Observatory created the world’s first publicly available 10-m LULC map by automating and sharing a deep-learning model that was run on over 450,000 Sentinel-2 scenes.  The resulting map, released freely on Esri’s Living Atlas in June 2021, displays ten classes across the globe: built area, trees, scrub/shrub, cropland, bare ground, flooded vegetation, water, grassland, permanent snow/ice, clouds.  Here, we discuss key findings from the resulting map, including a quantitative analysis of how 10-m resolution allows us to assess small, low density urban areas compared to other LULC products, including the Copernicus CGLS-LC100 100-m resolution global map.  We will also share how we support project-based, on-demand LULC mapping and will present preliminary findings from a new globally consistent 2017-2021 annual LULC dataset across the entire Sentinel-2 archive.

How to cite: Wright, D., Brumby, S., Breyer, S., Fitzgibbon, A., Pisut, D., Statman-Weil, Z., Hannel, M., Mathis, M., and Kontgis, C.: Mapping the World at 10 m: A Novel Deep-Learning Land Use Land Cover Product and Beyond, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9012, https://doi.org/10.5194/egusphere-egu22-9012, 2022.

EGU22-4300 | Presentations | ESSI1.4

Synergetic use of Sentinel-1 and Sentinel-2 data for large-scale Land Use/Land Cover Mapping

Melanie Brandmeier, Maximilian Hell, Eya Cherif, and Andreas Nüchter

One of the largest threats to the vast ecosystem of the Brazilian Amazon Forest is deforestation and forest degradation caused by human activity. The possibility to continuously monitor these degradation events has recently become more feasible through the use of freely available satellite remote sensing data and machine learning algorithms suited for big datasets.

A fundamental challenge of such large-scale monitoring tasks is the automatic generation of reliable and correct land use and land cover (LULC) maps. This is achieved by the development of robust deep learning models that generalize well on new data. However, these approaches require large amounts of labeled training data. We use the latest results of the MapBiomas project as the ‘ground-truth’ for developing new algorithms. In this project, Souza et al. [1] used yearly composites of USGS Landsat imagery to classify the LULC for the whole of Brazil. The latest iteration of their work became available for the years 1985–2020 as Collection 6 (https://mapbiomas.org). However, this reference data cannot be considered real ground truth, as it is itself generated from machine learning models and therefore requires novel approaches suited to overcome such problems of weakly supervised learning.

As tropical regions are often covered by clouds, radar data is better suited for continuous mapping than optical imagery, due to its cloud-penetrating capabilities. In a preliminary study, we combined data from ESA’s Sentinel-1 (radar) and Sentinel-2 (multispectral) missions for developing algorithms suited to act on multi-modal and -temporal data to obtain accurate LULC maps. The best performing proposed deep learning network, DeepForestM2, employed a seven-month radar time series combined with a single optical scene. This model configuration reached an overall accuracy of 75.0% on independent test data. A state-of-the-art (SotA) DeepLab model, trained on the very same data, reached an overall accuracy of 69.9%.

Currently, we are further developing this approach of fusing multi-modal data with a temporal aspect to improve on LULC classification. Larger amounts of more recent data, both Sentinel-1 and Sentinel-2 from 2020 are included in training experiments. Additional deep learning networks and approaches to deal with weakly supervised [2] learning are developed and tested on the data. The need for the weakly supervised methods arises from the reference data, which is both inaccurate and inexact, i.e., has a coarser spatial resolution than the training data. We aim to improve the classification results qualitatively, as well as quantitatively compared to SotA methods, especially with respect to generalizing well on new datasets. The resulting deep learning methods, together with the trained weights, will also be made accessible through a geoprocessing tool in Esri’s ArcGIS Pro for users without coding background.

  • Carlos M. Souza et al. “Reconstructing Three Decades of Land Use and Land Cover Changes in Brazilian Biomes with Landsat Archive and Earth Engine”. en. In: Remote Sensing 17 (Jan. 2020). Number: 17 Publisher: Multidisciplinary Digital Publishing Institute, p. 2735. DOI: 10.3390/ rs12172735.
  • Zhi-Hua Zhou. “A brief introduction to weakly supervised learning”. In: National Science Review 5.1 (Jan. 2018), pp. 44–53. ISSN: 2095-5138. DOI: 10.1093/nsr/nwx106.

How to cite: Brandmeier, M., Hell, M., Cherif, E., and Nüchter, A.: Synergetic use of Sentinel-1 and Sentinel-2 data for large-scale Land Use/Land Cover Mapping, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4300, https://doi.org/10.5194/egusphere-egu22-4300, 2022.

EGU22-11946 | Presentations | ESSI1.4

A GEOBIA-based approach for mapping Urban Green Spaces using PlanetScope imagery: the case of Athens

Evangelos Dosiadis, Dimitris Triantakonstantis, Ana-Maria Popa, Spyridon E. Detsikas, Ionut Sandric, George P. Petropoulos, Diana Onose, and Christos Chalkias

The technological developments in geoinformatics in recent decades have allowed the inclusion of geospatial data and analysis techniques in a wide range of scientific disciplines. One such field is associated with the study of urban green spaces (UGS). Those are defined as open, undeveloped areas that provide residents with recreational space, improving the aesthetic and environmental quality of the neighboring areas. Mapping accurately their spatial extent is absolutely essential requirement in urban planning and their preservation and expansion in Metropolitan areas are of high importance to protect the environment and public health.

 
The objective of this study is to explore the use of high spatial resolution satellite imagery from PlanetScope combined with the Geographic Object-Based Image Analysis (GEOBIA) classification approach in mapping UGS in Athens, Greece. For the UGS retrieval, an object-based classification (GEOBIA) method was developed utilizing a multispectral PlanetScope imagery acquired in June 2020. Accuracy assessment was performed with a confusion matrix utilizing a set of randomly selected control points within the image selected from field visits and image photo-interpretation. In addition, the obtained UGS were compared versus independent estimates of the Green Urban Areas from the Urban Atlas global operational product. All the geospatial data analysis was conducted in a GIS environment (ArcGIS Pro).


Results demonstrated the usefulness of GEOBIA technique when combined with very high spatial-resolution satellite imagery from PlanetScope in mapping UGS, as was demonstrated by the high accuracy results that were obtained from the statistical comparisons. With the technological evolution in the Earth Observation datasets acquisition and image processing techniques, mapping UGS has been optimized and facilitated and this study contributes in this direction. 

KEYWORDS: Urban Green Spaces, Athens, PlanetScope, Earth Observation, GEOBIA

How to cite: Dosiadis, E., Triantakonstantis, D., Popa, A.-M., Detsikas, S. E., Sandric, I., Petropoulos, G. P., Onose, D., and Chalkias, C.: A GEOBIA-based approach for mapping Urban Green Spaces using PlanetScope imagery: the case of Athens, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11946, https://doi.org/10.5194/egusphere-egu22-11946, 2022.

The definition of urbanized areas, both regionally and globally, is an important basis for urban development monitoring and management, as well as an important condition for studying social policies, economics, culture and the environment.

Thanks to the development of science and technology, urban expansion is developing rapidly. The method of extracting urbanized areas quickly and accurately has become the focus of research.

In the 1970s, with the beginning of the Defense Meteorological Satellite Program (DMSP), the images of night lights that provide a new method for the extraction of urbanized areas were born.

However, due to the limits of spatial resolution and spectral range, it’s true that there are defects in urbanized area extraction based on OMSP-OLS nightlight images.

In recent years, with the development of remote sensing technology, remote sensing data with a higher resolution emerged, providing an effective and applicable data source for urban planning monitoring.

I suppose that the images of night lights with a higher resolution have greater precision than the old ones in the extraction of urbanized areas.

This work has dedicated the images of night lights (NPP-VIIRS and Luojia1-01) and the images of urbanized areas (FROM-GLC 2017) to construct a logistic regression model to evaluate and compare the accuracy of the two images of night lights in the extraction of urbanized areas.

The case study is Barcelona metropolitan area, Spain. (636 km2, 3.3 million inhabitants).

How to cite: Zheng, Q. and Roca, J.: The extraction of urbanized areas based on the high-resolution night lights images: A case study in Barcelona, Spain  , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8019, https://doi.org/10.5194/egusphere-egu22-8019, 2022.

EGU22-4678 | Presentations | ESSI1.4

Lithology Mapping with Satellite, Fieldwork-based Spectral data, and Machine Learning: the case study of Beiras Group (Central Portugal)

João Pereira, Alcides J.S.C. Pereira, Artur Gil, and Vasco M. Mantas

The lack of cartography increases the problematic of poor knowledge of geological resources and land management in regions that could benefit greatly from this information. Remote sensing has been an invaluable mean of obtaining data to perform geological mapping objectively and with high scientific accuracy. In Portugal, there is a large gap of cartographic information at 1:50 000 scale throughout the territory, so this work intends to complement this problem through a set of techniques and methodologies applied to a study of a region of Grupo das Beiras.

Spectral databases serve as an initial tool for any methodology involving spectral analysis, namely for the development of cartography methods and quick characterization of rock samples.

To address these issues, a multispectral analysis of january and july 2015th scenes with low cloud cover and atmospheric corrections (level 2) was obtained from Landsat 8 (LS8). Certain statistical tests such as ANOVA and Tukey's were applied to both images to clearly know whether significant differences exist between lithologies.

For the hyperspectral analysis, two sampling campaigns were carried out with the collection of rock samples of metasediments and granites and soil. The analysis was performed in fresh samples, crushed samples (2 mm - 500 μm; 500 μm - 125μm; <125 μm) and soil samples demonstrating a significantly different spectral behavior among various particle sizes in the hyperspectral signatures between fresh and crushed samples. X-ray fluorescence (FRX) was used to obtain geochemical data of major elements to validate the spectral results obtained. As a result, there were identified correspondences between the obtained hyperspectral data and the databases as well in the literature meaning that the spectral signatures of this research are consistent with the studied samples.

The creation of machine learning models is an emerging tool for cartography in which LS8 reflectance data was used for this elaboration. In this work and for this context the models proved to be useful and successful for the image classification from algorithms assigned for this function.

How to cite: Pereira, J., Pereira, A. J. S. C., Gil, A., and Mantas, V. M.: Lithology Mapping with Satellite, Fieldwork-based Spectral data, and Machine Learning: the case study of Beiras Group (Central Portugal), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4678, https://doi.org/10.5194/egusphere-egu22-4678, 2022.

EGU22-9265 | Presentations | ESSI1.4

Application of unsupervised machine learning techniques for lithological and soil mapping in Ossa-Morena Zone

Marcelo Silva, Pedro Nogueira, Renato Henriques, and Mário Gonçalves

Unsupervised methods are a good entry point for satellite image classification, requiring little to no input, and outputting an analysis, in the form of a thematic map, that may act as a guide for more user input intensive methods. For this work, we use K-means methods to classify satellite and drone imagery that cover the Ossa-Morena Zone (OMZ), in Portugal, and assess their capacity for lithological and soil mapping. The drone is equipped with a High Precision NDVI Single Sensor and was flown over the ancient mines of Mociços, Mostardeira and Santa Eulália. The OMZ is a tectonostratigraphic domain shared between Portugal and Spain, divided in Sectors, extraordinarily rich and diverse from a lithological, stratigraphical, and structural point-of-view; for this work, we will focus on the Estremoz-Barrancos sector, comprised of a Neoproterozoic to Devonian metasedimentary succession, with a low-grade metamorphism in greenschist facies, and the Santa Eulália Plutonic Complex (SEPC), an elliptic late-Variscan granitic massif that crosscuts the Alter do Chão-Elvas Sector and the Blastomylonitic belt, constituted by two granitic facies, a few small mafic bodies, and some roof pendants that belong to the Alter do Chão-Elvas Sector.

The imagery used correspond to high-level satellite imagery products gathered between 2004 to 2006 (ASTER) and 2017 to 2021 (Landsat 8 and Sentinel-2), and drone imagery captured on May 6th and August 31st, 2021.

The K-means was applied to a variable number of selected bands, including band ratios, and tested for different number of initial clusters and different distance algorithms (Minimum Distance and Spectral Angle Mapping). Afterwards, it was assessed its ability to outlining and classify different geological structures by comparing the results to the geological map of OMZ.

The obtained thematic maps points towards poorer results when using a larger selection of bands - for instance, ASTER bands 1 to 9 (in which bands 1 to 3N were resampled to 30m) -, due to interspersion of different classes, whereas when using band ratio combinations, such as 4/2 and 6/(5+7) (ASTER), the produced map successfully classifies the major geological features present in the region, with increased sharpness between contacts with a higher number of classes.

Results show that K-means, when used under the correct conditions and parameters, has the potential for lithological and soil mapping through image classification, both for satellite and drone imagery.

Future work will focus on the integration of a pre-processing step for band selection using ML techniques, such as through Principal Component Analysis, Minimum Noise Fraction and Random Forest.

The authors acknowledge the funding provided by FCT through the Institute of Earth Sciences (ICT) with the reference UIDB/GEO/04683/2020.

How to cite: Silva, M., Nogueira, P., Henriques, R., and Gonçalves, M.: Application of unsupervised machine learning techniques for lithological and soil mapping in Ossa-Morena Zone, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9265, https://doi.org/10.5194/egusphere-egu22-9265, 2022.

EGU22-5333 | Presentations | ESSI1.4

Remote sensing – based analysis of the islands dynamics in the Lower Danube River

Marina Virghileanu and Gabriela Ioana-Toroimac

River islands are important components of the river morpho-dynamics, which can provide essential information on fluvial processes, as well as on sediment and flow regimes. In the same time, river islands play an essential role from the political, environmental and socio-cultural points of view. Thus, understanding the temporal dynamics of the river islands is a required task for channel navigation safety, port functionality, agricultural production and biodiversity. The aim of this study is to analyse the spatial and temporal changes on the river islands during the last 40 years, based on satellite remotely sensed images. The study focuses on the Lower Danube River, downstream the Iron Gates dams altering the flow and sediment load, which also suffers from dredging for navigation. The islands of the Lower Danube River generate major impacts on riparian states relationship, interfere with the ports activity and EU investments (as it is the case of Rast port in Romania), or are the subject of ecological restoration. Multispectral satellite data, including Landsat and Sentinel-2 images, were used for river islands mapping at different temporal moments, with a medium spatial resolution (up to 15 m on Landsat pansharpened data and 10 m on Sentinel-2). Spectral indices, as NDVI and NDWI, allowed the automatic extraction of island boundaries and land cover information. On these, two processes were carried out: 1) the characterization of the river islands morphology, and 2) the quantification of the spatial and temporal changes over time. The resulted data are connected with in-situ measurements on flow regime and sediment supply, as well as with flood events and human activities in order to identify the potential drivers of change. The results demonstrate a strong correlation between river islands dynamics and flood events in the Lower Danube River, as the major flood event from 2006 significantly modified the islands size and shape. This research can allow the identification of the evolutionary model of the Danube River.

 

This research work was conducted as part of the project PCE 164/2021 “State, Communities and Nature of the Lower Danube Islands: An Environmental History (1830-2020)”, financed by the UEFISCDI.

How to cite: Virghileanu, M. and Ioana-Toroimac, G.: Remote sensing – based analysis of the islands dynamics in the Lower Danube River, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5333, https://doi.org/10.5194/egusphere-egu22-5333, 2022.

EGU22-10163 | Presentations | ESSI1.4

Utilizing hyperspectral imagery for burnt area mapping in a Greek setting

Christina Lekka, Spyridon E. Detsikas, George P. Petropoulos, Petros Katsafados, Dimitris Triantakonstantis, and Prashant K. Srivastava

Earth observation (EO) - particularly so from hyperspectral imagers - gains increasing interest in wildfire mapping as it offers a prompt with high accuracy and low-cost delineation of a burnt area.  A key hyperspectral orbital sensor with over 20 years of operational life is Compact High-Resolution Imaging Spectrometer (CHRIS), onboard ESA’s PROBA platform. This mission sensor collects spectral data in the VNIR range (400 - 1050 nm) simultaneously at 5 viewing angles and at different spatial resolutions of 17 m and 34 m which contains 19 and 63 spectral bands respectively. The present study focuses on exploring the use of CHRIS PROBA legacy data combined with machine learning (ML) algorithms in obtaining a burnt area cartography. In this context, a further objective of the study has been to examine the contribution of the multi-angle sensor capabilities to enhance the burn scar detection. As a case study was selected a wildfire occurred during the summer of 2007 in the island of Evvoia, in central Greece for which imagery from the CHRIS PROBA archive shortly after the fire outbreak was available. For the accuracy assessment of the derived burnt area estimate the error matrix statistics were calculated in ENVI. Burnt area estimates from were also further validated against the operational product developed in the framework of ESA’s Global Monitoring for Environmental Security/Service Element. This study’s results evidenced the added value of satellite hyperspectral imagery combined with ML classifiers as a cost-effective and robust approach to evaluate a burnt area extent, particularly so of the multi-angle capability in this case. All in all, the study findings can also provide important insights towards the exploitation of hyperspectral imagery acquired from current missions (e.g. HySIS, PRISMA, CHRIS, DESIS) as well as upcoming ones (e.g. EnMAP, Shalom, HySpiri and Chime).

KEYWORDS: CHRIS-PROBA, hyperspectral, machine learning, burnt area mapping

How to cite: Lekka, C., Detsikas, S. E., Petropoulos, G. P., Katsafados, P., Triantakonstantis, D., and Srivastava, P. K.: Utilizing hyperspectral imagery for burnt area mapping in a Greek setting, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10163, https://doi.org/10.5194/egusphere-egu22-10163, 2022.

EGU22-7726 | Presentations | ESSI1.4

Investigating the links between primary metabolites of medicinal species with leaf hyperspectral reflectance

Ayushi Gupta, Prashant K Srivastava, and Karuna Shanker

Recent studies have shown that the turnover in tree species composition across edaphic and elevational gradients is strongly correlated with functional traits. However, our understanding of functional traits has been limited by the lack of detailed studies of foliar chemistry across habitats and the logistical & economic challenges associated with the analysis of plant functional traits at large geographical scales. Advances in remote sensing and spectroscopic approaches that measure spectrally detailed light reflectance and transmittance of plant foliage provides accurate predictions of several functional chemical traits. In this study, Pyracantha crenulata (D. Don) M. Roemer has been used, which is an evergreen thorny shrub species found in open slopes between 1,000 and 2,400 m above mean sea level. P. crenulata is used in the treatment of hepatic, cardiac, stomach, and skin disease. In this study the P. crenulata leaves samples spectra were recorded using an ASD spectroradiometer and following primary metabolites such as chlorophyll, anthocyanin, phenolic, and sterol were analyzed. The spectroradiometer data were preprocessed using filter and then reduced to a few sensitive bands by applying feature selection to the hyperspectral data. The band values were directly correlated with the measured values. The analysis indicates a significant correlation between P. crenulata primary metabolite in the Visible and Infrared region (VISIR). This result suggests that molecules that have important functional attributes could be identified by VISIR spectroscopy, which would save a lot of time and expense as compared to wet laboratory analysis.

How to cite: Gupta, A., Srivastava, P. K., and Shanker, K.: Investigating the links between primary metabolites of medicinal species with leaf hyperspectral reflectance, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7726, https://doi.org/10.5194/egusphere-egu22-7726, 2022.

EGU22-7859 | Presentations | ESSI1.4

Predictive performance of deep-learning-enhanced remote-sensing data for ecological variables of tidal flats over time

Logambal Madhuanand, Katja Phillippart, Wiebe Nijland, Jiong Wang, Steven M. De Jong, Allert I. Bijleveld, and Elisabeth A. Addink

Tidal flat systems with a diverse benthic community (e.g., bivalves, polychaetes and crustaceans) is important in the food chain for migratory birds and fish. The geographical distribution of macrozoobenthos depends on physical factors, among which sediment characteristics are key aspects. Although high-resolution and high-frequency mapping of benthic indices (i.e., sediment composition and benthic fauna) of these coastal systems are essential to coastal management plans, it is challenging to gather such information on tidal flats through in-situ measurements. The Synoptic Intertidal Benthic Survey (SIBES) database provides this field information for a 500m grid annual for the Dutch Wadden Sea, but continuous coverage and seasonal dynamics are still lacking. Remote sensing may be the only feasible monitoring method to fill in this gap, but it is hampered by the lack of spectral contrast and variation in this environment. In this study, we used a deep-learning model to enhance the information extraction from remote-sensing images for the prediction of environmental and ecological variables of the tidal flats of the Dutch Wadden Sea. A Variational Auto Encoder (VAE) deep-learning model was trained with Sentinel-2 satellite images with four bands (blue, green, red and near-infrared) over three years (2018, 2019 and 2020) of the tidal flats of the Dutch Wadden Sea. The model was trained to derive important characteristics of the tidal flats as image features by reproducing the input image. These features contain representative information from the four input bands, like spatial texture and band ratios, to complement the low-contrast spectral signatures. The VAE features, the spectral bands and the field-collected samples together were used to train a random forest model to predict the sediment characteristics: median grain size and silt content, and macrozoobenthic biomass and species richness. The prediction was done on the tidal flats of Pinkegat and Zoutkamperlaag of the Dutch Wadden sea. The encoded features consistently increased the accuracy of the predictive model. Compared to a model trained with just the spectral bands, the use of encoded features improved the prediction (coefficient of determination, R2) by 10-15% points for 2018, 2019 and 2020. Our approach improves the available techniques for mapping and monitoring of sediment and macrozoobenthic properties of tidal flat systems and thereby contribute towards their sustainable management.

How to cite: Madhuanand, L., Phillippart, K., Nijland, W., Wang, J., De Jong, S. M., Bijleveld, A. I., and Addink, E. A.: Predictive performance of deep-learning-enhanced remote-sensing data for ecological variables of tidal flats over time, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7859, https://doi.org/10.5194/egusphere-egu22-7859, 2022.

Increased observation frequencies are current trends in optical remote sensing. However, there are still challenges at the night side when sunlight is not available. Due to their powerful capabilities in low-light sensing, nightlight satellite sensors have been deployed to capture nightscapes of the Earth from space, observing anthropomorphic and natural activities at night. At present, most nightlight remote sensing applications have mostly focused on artificial lights, particularly within cities or self-luminous entities such as fisheries, oil, shale gas, offshore rigs, and other self-luminous bodies. Little attention has been paid to examining the potential of nightlight remote sensing for mapping land surfaces in low-light suburban areas using satellite remote sensing technology. Observations taken under moonlight are often discarded or corrected to reduce the lunar effects. Some researchers have discussed the possibility of moonlight as a useful illuminating source at night for the detection of nocturnal features on Earth, but no quantitative analysis has been reported so far. This study aims to systematically evaluate the potential of moonlight remote sensing with the whole month of mono-spectral Visible Infrared Imaging Radiometer Suite/Day-Night-Band (VIIRS/DNB) and multi-spectral Unmanned Aerial Vehicle (UAV) nighttime images. The present study aims to:1) to study the potential of moonlight remote sensing for mapping land surface in low-light suburban areas; 2) to investigate the Earth observation capability of moonlight data under different lunar phases;3) to make two daily uniform nightlight datasets(moonlight included and removed) for various night scenes researches, like weather diurnal forecast, circadian rhythms in plants and so on; 4) to discuss the requirements for the next-generation nightlight remote sensing satellite sensors.

How to cite: Liu, D. and Zhang, Q.: The Potential of Moonlight Remote Sensing: A Systematic Assessment with Multi-Source and Multi-Moon phase Nightlight Data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3380, https://doi.org/10.5194/egusphere-egu22-3380, 2022.

Greku R.Kh., Greku D.R.

Institute of Geological Sciences, Ukraine

SATMAR Laboratory, DDS Capital Investments, Australia

 

The geoid gravity potential inversion to dense anomalies and their comparison with the seismic tomography models

 

The results of using the gravitational tomography method is based on the use of algorithms for inverting the values ​​of the gravitational potential (geoid) for calculating the Earth's density anomalies in the entire range of depths up to 5300 km [H. Moritz. The Figure of the Earth's Interior, Wichmann / Karlsruhe, 1990]. The initial data are the anomalies of the geoid heights according to the EGM2008 model in the expansion in spherical functions to harmonics n, m = 2190. The spatial resolution of the data on the surface is 10 km. The depths of the disturbing masses are determined taking into account the harmonic number. The result is maps of density distribution at specified depths, vertical sections and 3D models.

Examples of the distribution of density anomalies for certain regions of Ukraine, Europe and Antarctica are given. Discrepancies with known works on seismotomography are mainly due to different physical properties of the studied medium: density and acoustic properties of rocks.

Density anomaly results are reported as the percent deviation from the Earth's PREM density model for a given location and depth. The entire range of density anomalies in the form of deviations from the PREM model does not exceed 12%. Complete coincidence of the results is observed, for example, at great depths of 2800 km throughout the Earth. The section through the continent of Antarctica with a complex relief and structure to a depth of 400 km also shows similar images from seismic and gravity tomography. The gravitomographic model of the tectonically active region of Vrancea confirms the delamination nature of the formation of the disturbing mass and the occurrence of earthquakes in Europe.

The original call to the present topic of the GD7.5 session (Prof. Saskia Goes) rightly notes the important role of rheological variability in the mantle layers on the deformation of the earth's crust and surface, which can cause catastrophic destruction of large-block structures. In this sense, the intensity of the inner layers according to the data of structural inhomogeneities becomes more and more urgent.

How to cite: Greku, R. and Greku, D.: The geoid gravity potential inversion to dense anomalies and their comparison with the seismic tomography models, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3316, https://doi.org/10.5194/egusphere-egu22-3316, 2022.

EGU22-12092 | Presentations | ESSI1.4

Assessment of 10m Spectral and Broadband Surface Albedo Products from Sentinel-2 and MODIS data

Jan-Peter Muller, Rui Song, Alistair Francis, Nadine Gobron, Jian Peng, and Nathan Torbick

In Song et al. (2021) [1] a framework for the retrieval of 10 m and 20 m spectral and 20 m broadband surface albedo products was described. This framework consists of four modules: 1) a machine learning based cloud detection method, Spectral ENcoder for SEnsor Independence (SEnSeI) [2]. 2) an advanced atmospheric correction model Sensor Invariant Atmospheric Correction (SIAC) [3]. 3) an endmember-based class extraction method, which enables the retrieval of 10 m/20 m albedos based on a regression between the MODIS Bidirectional Reflectance Distribution Function (BRDF) derived surface albedo and Sentinel-2 surface reflectance resampled to MODIS resolution. 4) a novel method of using the MODIS BRDF prior developed within the QA4ECV programme (http://www.qa4ecv.eu/) to fill in the gaps in a time series caused by cloud obscuration. We describe how ~1100 scenes were processed over 22 Sentinel-2 tiles at the STFC JASMIN facility. These tiles spanned different 4 month time periods for different users with a maximum of 22 dates per tile. These tiles cover Italy, Germany, South Africa, South Sudan, Ukraine and UK for 6 different users. For the Italian site, a detailed analysis was performed of the impact of this hr-albedo on the fAPAR and LAI derived using TIP [5] whilst a second user employed a method described in [6] to compare MODIS and Sentinel-2 and a third user looked at the impact on agricultural yield forecasting. Lessons learnt from these different applications will be described including both the opportunities and areas where further work is required to improve the data quality.

 

We thank ESA for their support through ESA-HR-AlbedoMap: Contract CO 4000130413 and the STFC JASMIN facility and in particular Victoria Bennett for their assistance.

[1] Song, R., Muller, J.-P., Francis, A., A Method of Retrieving 10-m Spectral Surface Albedo Products from Sentinel-2 and MODIS data," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 2381-2384, doi: 10.1109/IGARSS47720.2021.9554356

[2] Francis, A., Mrziglod, J., Sidiropoulos, P.  and J.-P. Muller, "SEnSeI: A Deep Learning Module for Creating Sensor Independent Cloud Masks," in IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2021.3128280.

[3] Feng et al. (2019) A Sensor Invariant Atmospheric Correction: Sentinel-2/MSI AND Landsat 8/OLI https://doi.org/10.31223/osf.io/ps957.

[4] Song, R.; Muller, J.-P.; Kharbouche, S.; Yin, F.; Woodgate, W.; Kitchen, M.; Roland, M.; Arriga, N.; Meyer, W.; Koerber, G.; Bonal, D.; Burban, B.; Knohl, A.; Siebicke, L.; Buysse, P.; Loubet, B.; Leonardo, M.; Lerebourg, C.; Gobron, N. Validation of Space-Based Albedo Products from Upscaled Tower-Based Measurements Over Heterogeneous and Homogeneous Landscapes. Remote Sensing 2020, 12, 1–23.doi: 10.3390/rs12050833

[5] Gobron, N.; Marioni, M.; Muller, J.-P.; Song, R.; Francis, A. M.; Feng, Y.; Lewis, P. ESA Sentinel-2 Albedo Case Study: FAPAR and LAI downstream products.; 2021; pp. 1–30. JRC TR (in press)

[6] Peng, J.; Kharbouche, S.; Muller, J.-P.; Danne, O.; Blessing, S.; Giering, R.; Gobron, N.; Ludwig, R.; Mueller, B.; Leng, G.; Lees, T.; Dadson, S. Influences of leaf area index and albedo on estimating energy fluxes with HOLAPS framework. J Hydrol 2020, 580, 124245.

How to cite: Muller, J.-P., Song, R., Francis, A., Gobron, N., Peng, J., and Torbick, N.: Assessment of 10m Spectral and Broadband Surface Albedo Products from Sentinel-2 and MODIS data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12092, https://doi.org/10.5194/egusphere-egu22-12092, 2022.

EGU22-12524 | Presentations | ESSI1.4

Error-reducing Structure-from-Motion derived Digital Elevation Models in data-scarce environments

Dirk Bakker, Phuoc Phùng, Marc van den Homberg, Sander Veraverbeke, and Anaïs Couasnon

High-accuracy Digital Elevation Models (DEMs) improve the quality of flood risk assessments and many other environmental applications, yet these products are often unavailable in developing countries due to high survey costs. Structure-from-Motion (SfM) photogrammetry combined with Unmanned Aerial Vehicles (UAVs) has been proven as an effective and low-cost technique that enables a wide audience to construct local-scale DEMs. However, the deviation from strict survey designs and guidelines regarding the number and distribution of Ground Control Points (GCPs) can result in linear and doming errors. Two surveys that suffer from these errors have been supplied for error-reduction, but both areas did not have an available high-accuracy DEM or could afford an additional differential Global Navigation Satellite System (dGNSS) ground survey to extract control points from to use in relative georeferencing approach. Little attention has been given to error-reduction using global open-access elevation data, such as: The TerraSAR-X add-on for Digital Elevation Measurements (TanDEM-X) 90; the Ice, Cloud and land Elevation Satellite-2 (ICESat-2); and Hydroweb.

The aim of this study was to improve and validate the two DEMs using control point extraction from the above data and analyze the validation results to determine the impact on error-reduction using regression analyses between the vertical error and distance from nearest control point. The outcomes shows that the ICESat-2 and Hydroweb can support surveys in absence of dGNSS GCPs with similar impact but cannot replace the necessity of dGNSS measurements in georeferencing and validation. These findings suggests that survey guidelines can be maintained with global open-access elevation data, but the effectiveness depends on both the number, distribution and estimated accuracy. Doming errors can be prevented by correct camera lens calibration, which depends on stable lens conditions or a stratified distribution of high-accuracy reference data. The validation of the SfM DEM in data-scarce areas proves difficult due to the lack of an independent validation dataset, but the Copernicus GLO-30 can give a quantification and show the spatial variability of the error. This study highlights the increasing accuracy of global open-access elevation data and shows that these databases allow the user to easily acquire more and independent data for georeferencing and validation, but the RSME is unable to be accurately reduced to sub-meter.

How to cite: Bakker, D., Phùng, P., van den Homberg, M., Veraverbeke, S., and Couasnon, A.: Error-reducing Structure-from-Motion derived Digital Elevation Models in data-scarce environments, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12524, https://doi.org/10.5194/egusphere-egu22-12524, 2022.

EGU22-13002 | Presentations | ESSI1.4

ORBiDANSe: Orbital Big Datacube Analytics Service

Peter Baumann and Dimitar Misev

Datacubes form an accepted cornerstone for analysis (and visualization) ready spatio-temporal data offerings. The increase in user friendliness is achieved by abstracting away from the zillions of files in provider-specific organization. Data¬cube query languages additionally establish actionable datacubes enabling users to ask "any query, any time" with zero coding.

However, typically datacube deployments are aiming at large scale, data center environments accommodating Big Data and massive parallel processing capabilities for achieving decent performance. In this contribution, we conversely report about a downscaling experiment. In the ORBiDANSE project a datacube engine, rasdaman, has been ported to a cubesat, ESA OPS-SAT, and is operational in space. Effectively, the satellite thereby becomes a datacube service offering the standards-based query capabilities of the OGC Web Coverage Processing (WCPS) geo datacube analytics language.
We believe this will pave the way for on-board ad-hoc pro-cessing and filtering on Big EO Data, thereby unleashing them to a larger audience and in substantially shorter time.

In our talk, we report about the concept, technology, and experimental results of ad-hoc on-board datacube query processing.

 

How to cite: Baumann, P. and Misev, D.: ORBiDANSe: Orbital Big Datacube Analytics Service, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13002, https://doi.org/10.5194/egusphere-egu22-13002, 2022.

ESSI2.3 – Established and Establishing Disciplinary International Frameworks that will Ultimately Enable Real-Time Interdisciplinary Sharing of Data.

EGU22-8474 | Presentations | ESSI2.3

The construction of the eLTER Pan-European research infrastructure to support multidisciplinary environmental data integration and analysis

John Watkins, Johannes Peterseil, Alessandro Oggioni, and Vladan Minic

One of the major goals of the upcoming European integrated Long-Term Ecosystem  critical zone and socio-ecological Research Infrastructure (eLTER RI) is to provide reliable and quality-controlled long-term environmental data from various disciplines for scientific analysis as well as the assessment of environmental policy impacts. For this purpose, eLTER has been designing and piloting a federated data infrastructure for integration and dissemination of a broad range of in situ observations and related data.
Implementing such a pan-European environmental data infrastructure is a lengthy and complex process driven by user needs, shareholder requirements and general service and technology best practises. The European LTER community has laid the foundations of this eLTER Information System. For further improvements, user needs have recently been collected by (a) targeted interviews with selected stakeholders to identify requirements, (b) workshops mapping requirements to potential RI services, and (c) analysis work for designing the RI service portfolio for. The requirements collections are used to derive functional (i.e. the behaviour of essential features of the system) and non-functional (i.e. the general characteristics of the system) requirements for the IT infrastructure and services. These collected requirements revolve around the development of workflows for the ingestion, curation and publication of data objects including the creation, harvesting, discovery and visualisation of metadata as well as providing means to support the analysis of these datasets and communicating study results.
Considering that downstream analyses of data from both eLTER and other RIs are a key part of the RI´s scope the design includes virtual collaborative environments where different data and analyses can be brought together and results shared with FAIR principles as the default for research practice. The eLTER RI will take advantage of data stored in existing partner data systems, harmonised by a central discovery portal and federated data access components providing a common information management infrastructure for bridging across environmental RIs.
This presentation will provide an overview of the current stage of the eLTER RI developments as well as its major components, provide an outlook for future developments and discuss the technical and scientific challenges of building the eLTER RI for interdisciplinary data sharing.

How to cite: Watkins, J., Peterseil, J., Oggioni, A., and Minic, V.: The construction of the eLTER Pan-European research infrastructure to support multidisciplinary environmental data integration and analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8474, https://doi.org/10.5194/egusphere-egu22-8474, 2022.

EGU22-2145 | Presentations | ESSI2.3

25 years of the IPCC Data Distribution Centre at the German Climate Computing Center (DKRZ)

Martina Stockhause and Michael Lautenschlager

The Data Distribution Centre (DDC) of the Intergovernmental Panel on Climate Change (IPCC) celebrates its 25th anniversary in 2022. DKRZ is the last remaining founding member among the DDC Partners. The contribution looks back on the past 25 years of the DDC at DKRZ from its establishment to the present. It shows which the milestones have been introduced in the areas of data management and data standardization, e.g. 

  • the NetCDF/CF data standard,
  • the DataCite data DOI assignment enabling data citation,  
  • the data preservation and stewardship standards of the World Data System (WDS), 
  • the Earth System Grid Federation (ESGF) as data infrastructure standard, or 
  • the IPCC FAIR Guidelines for the current 6th Assessment Report (AR6). 

In addition to the continuous effort to adopt new standards and curate the data holdings, current challenges - technical and organizational - and possible future directions are discussed. The most difficult of the challenges remains the long-term strategy for sustainable DDC services as a part of an increasingly interoperable data service environment, which is technically described in the FAIR digital object framework and which is contentwise guided by the UN Sustainable Development Goal 13 on climate action. 

(http://ipcc-data.org; http://ipcc.wdc-climate.de)

How to cite: Stockhause, M. and Lautenschlager, M.: 25 years of the IPCC Data Distribution Centre at the German Climate Computing Center (DKRZ), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2145, https://doi.org/10.5194/egusphere-egu22-2145, 2022.

EGU22-10897 | Presentations | ESSI2.3

Progressing the global samples community through the new partnership between IGSN and DataCite

Sarah Ramdeen, Kerstin Lehnert, Jens Klump, Matt Buys, Sarala Wimalaratne, and Lesley Wyborn

In October 2021, DataCite and the IGSN e.V. signed an agreement to form a partnership to support the global adoption, implementation, and use of physical sample identifiers. Both DataCite and IGSN currently offer the ability to provide Globally Unique Persistent, Resolvable Identifiers (GUPRIs) within the overall research ecosystem, and the proposed collaboration will bring together the strengths of each organization.

DataCite is a community-led organisation that has been providing the means to create, find, cite, connect, and use research across 47 countries globally since 2009. DataCite provides persistent identifiers (DOIs) for research data and other research outputs, and supports the efforts of several identifier communities. DataCite also develops services that make it easier for researchers to connect and share their DOIs with the broader research ecosystem. 
IGSN e.V. is an international, non-profit organization with more than 20 members and has a narrower focus than DataCite. The core purpose of IGSN is to enable transparent and traceable connections between samples, instruments, grants, data, publications, people and organizations. Since 2011, IGSN has provided a central registration system that enables researchers to apply a globally unique and persistent identifier for physical samples.

The proposed partnership will enable IGSN to leverage DataCite DOI registration while allowing IGSN to focus on community efforts such as promoting and expanding the global samples ecosystem and supporting new research and best practice in methods of identifying, citing, and locating physical samples. DataCite will provide the IGSN ID registration services and support to ensure the ongoing sustainability of the IGSN PID infrastructure and its integration with the global PID ecosystem.

This partnership is an opportunity for IGSN to reenvision its governance and community engagement and to reassess how the IGSN can best serve the community in today’s open science ecosystem. This talk will focus on the developing changes to the IGSN governance and community efforts.
Different research communities have a wide range of requirements towards metadata and identification of samples. The IGSN plans to develop an international ‘Community of Communities’ which will include members from the global samples community across multiple disciplines. The community will support varying levels of skills with PIDs and metadata. It will enable cohesion around the use of IGSN thus enabling greater research discovery, innovation and advancement for samples.

The IGSN Samples Community (IGSN SC) aspires to be a collaborative space for community development that promotes the use of samples and their connections to any derived observations, images, and analytical data.

How to cite: Ramdeen, S., Lehnert, K., Klump, J., Buys, M., Wimalaratne, S., and Wyborn, L.: Progressing the global samples community through the new partnership between IGSN and DataCite, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10897, https://doi.org/10.5194/egusphere-egu22-10897, 2022.

EGU22-5931 | Presentations | ESSI2.3

C-SCALE: A new Data and Compute Federation for Earth Observation

Christian Briese, Charis Chatzikyriakou, Diego Scardaci, Zdeněk Šustr, Enol Fernández, Björn Backeberg, and Elonora Testa

Through the provision of massive streams of high-resolution Earth Observation (EO) data, the EU Copernicus programme has established itself globally as the predominant spatial data provider. These data are widely used by research communities to monitor and address global challenges, such as environmental monitoring and climate change, supporting European policy initiatives, such as the Green Deal and others. To date, there is no single European data sharing and processing infrastructure that serves all datasets of interest, and Europe is falling behind international developments in Big Data analytics and computing.

The C-SCALE (Copernicus - eoSC AnaLytics Engine, https://c-scale.eu) project federates European EO infrastructure services, such as ESA’s Sentinel Collaborative Ground Segment, the Copernicus DIASes (Data and Information Access Services under the EC), independent nationally-funded EO service providers, and European Open Science Cloud (EOSC) e-infrastructure providers. It capitalises on EOSC's capacity and capabilities to support Copernicus research and operations with large and easily accessible European computing environments. The project will implement and publish the C-SCALE Federation in the EOSC Portal as a suite of complementary services that can be easily exploited. It will consist of a Data Federation, a service providing access to a large EO data archive, a Compute Federation, and analytics tools.

The C-SCALE Data Federation aims at making EO data providers under EOSC findable, their metadata databases searchable, and their product storage accessible. While a centralised, monolithic, complete Copernicus data archive may not be feasible, some organisations maintain various archives for limited areas of their interest. C-SCALE, therefore, integrates these heterogeneous resources into a “system of systems” that will offer the users an interface that, in most cases, provides similar functionality and quality of service as a centralised, monolithic data archive would. The federation is built on existing technologies, avoiding redundancy and replication of functions and not disrupting existing usage patterns at participating sites, instead only adding a simple layer for improved discovery and seamless access.

At the same time, the C-SCALE Compute Federation provides access to a wide range of computing providers (IaaS VMs, container orchestration platforms, HPC and HTC systems) to enable the analysis of Copernicus and EO data under EOSC. The design of the federation allows users to deploy their applications using federated authentication mechanisms, find their software under a common catalogue, and have access to data using C-SCALE Data Federation tools. The federation relies on existing tools and services already compliant with EOSC, thus facilitating the integration into the larger EOSC ecosystem.

By making such a scalable Big Copernicus Data Analytics federated services available through EOSC and its Portal and linking the problems and results with experience from other research disciplines, C-SCALE helps to support the EO sector in its development. By abstracting the set-up of computing and storage resources from the end-users, it enables the deployment of custom workflows to generate meaningful results quickly and easily. Furthermore, the project will deliver a blueprint, setting up an interaction model between service providers to facilitate interoperability between commercial and public cloud infrastructures.

How to cite: Briese, C., Chatzikyriakou, C., Scardaci, D., Šustr, Z., Fernández, E., Backeberg, B., and Testa, E.: C-SCALE: A new Data and Compute Federation for Earth Observation, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5931, https://doi.org/10.5194/egusphere-egu22-5931, 2022.

EGU22-11968 | Presentations | ESSI2.3

Proposed metadata standards for FAIR access to GNSS data

Anna Miglio, Andras Fabian, Carine Bruyninx, Stefanie De Bodt, Juliette Legrand, Paula Oset Garcia, and Inge Van Nieuwerburgh

Accurate positioning for activities such as navigation, mapping, and surveying rely on permanent stations located all over the world and continuously tracking Global Navigation Satellite Systems (GNSS, such as Galileo, GPS, GLONASS). 
The Royal Observatory of Belgium maintains repositories containing decades of observation data from hundreds of GNSS stations belonging to Belgian and European networks (e.g., the EUREF public repository). 
However, current procedures for accessing GNSS data do not adequately serve user needs. For example, in the case of the EUREF repository, despite the fact that its GNSS data originate from a significant number of data providers and could be handled in different ways, provenance information is lacking and data licenses are not always available.
In order to respond to user demands, GNSS data and the associated metadata need to be standardised, discoverable and interoperable i.e., made FAIR (Findable, Accessible, Interoperable, and Re-usable). Indeed, FAIR data principles serve as guidelines for making scientific data suitable for reuse, by both people and machines, under clearly defined conditions. 
We propose to identify existing metadata standards that cover the needs of the GNSS community to the maximum extent and to extend them and/or to develop an application profile, considering also best practices at other GNSS data repositories. 

Here we present two proposals for metadata to be provided to the users when querying and/or downloading GNSS data from GNSS data repositories. 
We first consider metadata containing station-specific information (e.g., station owner, GNSS equipment) and propose an extension of GeodesyML, an XML implementation of the eGeodesy model aligned with international standards such as ISO19115-1:2014 and OGC's GML. The proposed extension contains additional classes and properties from domain specific vocabularies when necessary, and includes extra metadata such as data license, file provenance information, etc. to comply with FAIR data principles. All proposed changes to GeodesyML are optional and therefore guarantee full backwards compatibility. 

Secondly, we consider metadata related to GNSS observation data i.e. RINEX data files. We propose an application profile based on the specifications of the Data Catalog Vocabulary (DCAT), a RDF vocabulary that, by design, facilitates the interoperability between data portals (supporting DCAT-based RDF documents) and enables publishing metadata directly on the web by using different formats.
In particular, our proposal (GNSS-DCAT-AP) includes new recommended metadata classes to describe the specific characteristics of GNSS observation data: the type of RINEX file (e.g., compression format, frequency); the RINEX file header and information regarding the GNSS station including the GNSS antenna and receiver; the software used to generate the RINEX  file. Additional optional classes allow the inclusion of information regarding the GNSS antenna, receiver and monument associated with the GNSS station and extracted from the IGS site log or GeodesyML files

How to cite: Miglio, A., Fabian, A., Bruyninx, C., De Bodt, S., Legrand, J., Oset Garcia, P., and Van Nieuwerburgh, I.: Proposed metadata standards for FAIR access to GNSS data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11968, https://doi.org/10.5194/egusphere-egu22-11968, 2022.

EGU22-6628 | Presentations | ESSI2.3

EPOS-GNSS – Current status of service implementation for European GNSS data and products

Rui Fernandes, Carine Bruyninx, Paul Crocker, Anne Socquet, and Mathilde Vergnolle and the EPOS-GNSS Members

EPOS-GNSS is the Thematic Core Service being implemented in the framework of the European Plate Observing System (EPOS) focused on management and dissemination of GNSS (Global Navigation Satellite Systems) Data and Products. The European Research Infrastructure Consortium (ERIC) has provided to EPOS a legal personality and capacity that is recognised in all EU Member States that permits to provide open access to a large pool of Solid Earth science integrated data, data products and facilities for researchers.

The GNSS community in Europe is benefiting from EPOS ERIC to create mechanisms and procedures to harmonize, in collaboration with other pan-European infrastructures (particularly EUREF), the access to GNSS data, metadata and derived products (time-series, velocities, and strain rate maps) that primarily are the interest of the Solid Earth community but ultimately benefit many other stakeholders, particularly data providers and other scientific and technical applications.

In this presentation we focus on the three main components that since last year have entered in the pre-operational phase: (a) Governance – with the aim that the entire community, from data providers to end-users, will be represented and their efforts recognized; (b) GLASS – the in-house dedicated software package developed for the dissemination of GNSS data and products with rigorous quality control procedures; (c) Products – internally consistent GNSS solutions of dedicated products (time-series, velocities and strain-rates) created from the available data set using state-of-art methodologies to be used to improve the understanding of the different Solid Earth mechanisms taken place in the European region.

How to cite: Fernandes, R., Bruyninx, C., Crocker, P., Socquet, A., and Vergnolle, M. and the EPOS-GNSS Members: EPOS-GNSS – Current status of service implementation for European GNSS data and products, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6628, https://doi.org/10.5194/egusphere-egu22-6628, 2022.

EGU22-10071 | Presentations | ESSI2.3

Facilitating Multi-Disciplinary Research via Integrated Access to the Seismological Data & Product Services of EPOS Seismology

Florian Haslinger, Lars Ottemöller, Carlo Cauzzi, Susana Custodio, Rémy Bossu, Alberto Michelini, Fabrice Cotton, Helen Crowley, Laurentiu Danciu, Irene Molinari, and Stefano Parolai

The European Plate Observing System EPOS is the single coordinated framework for solid Earth science data, products and services on a European level. As one of the science domain structures within EPOS, EPOS Seismology brings together the three large European infrastructures in seismology: ORFEUS for seismic waveform data & related products, EMSC for parametric earthquake information, and EFEHR for seismic hazard and risk information. Across these three pillars, EPOS Seismology provides services to store, discover and access seismological data and products from raw waveforms to elaborated hazard and risk assessment.

ORFEUS, EMSC and EFEHR are community initiatives / infrastructures that each have their own history, structure, membership, governance and established mode of work (including data sharing and distribution practices), developed in parts over decades. While many institutions and individuals are engaged in more than one of these initiatives, overall the active membership is quite distinct. Also, each of the initiatives has different connections to and interactions with other international organisations. Common to all is the adoption and promotion of recognized international standards for data, products and services originating from wider community organisations (e.g. FDSN, IASPEI, GEM), and the active participation in developing those further or creating new ones together with the community.     

In this presentation we will briefly review the history and development of the three initiatives and discuss how we set up EPOS Seismology as a joint coordination framework within EPOS. We will highlight issues encountered on the way and those that we are still trying to solve in our attempt to create and operate a coordinated research infrastructure that appropriately serves the needs of today’s scientific community. Among those issues is also the ‘timeliness’ of data and products: while a number of services offer almost-real-time access to newly available information at least in theory, this comes with various downstream implications that are currently actively discussed. We also cover the envisaged role of EPOS Seismology in supporting international multi-disciplinary activities that require and benefit from harmonized, open, and interoperable data, products, services and facilities from the waveform, catalogue and hazard / risk domains of seismology.

How to cite: Haslinger, F., Ottemöller, L., Cauzzi, C., Custodio, S., Bossu, R., Michelini, A., Cotton, F., Crowley, H., Danciu, L., Molinari, I., and Parolai, S.: Facilitating Multi-Disciplinary Research via Integrated Access to the Seismological Data & Product Services of EPOS Seismology, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10071, https://doi.org/10.5194/egusphere-egu22-10071, 2022.

EGU22-5478 | Presentations | ESSI2.3

EPOS-Norway Portal

Jan Michalek, Kuvvet Atakan, Christian Rønnevik, Sara Kverme, Lars Ottemøller, Øyvind Natvik, Tor Langeland, Ove Daae Lampe, Gro Fonnes, Jeremy Cook, Jon Magnus Christensen, Ulf Baadshaug, Halfdan Pascal Kierulf, Bjørn-Ove Grøtan, Odleiv Olesen, John Dehls, and Valerie Maupin

The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e‑infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data – University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.

In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes. 

The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.

How to cite: Michalek, J., Atakan, K., Rønnevik, C., Kverme, S., Ottemøller, L., Natvik, Ø., Langeland, T., Lampe, O. D., Fonnes, G., Cook, J., Christensen, J. M., Baadshaug, U., Kierulf, H. P., Grøtan, B.-O., Olesen, O., Dehls, J., and Maupin, V.: EPOS-Norway Portal, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5478, https://doi.org/10.5194/egusphere-egu22-5478, 2022.

EGU22-4265 | Presentations | ESSI2.3

EPOS Data portal for cross-disciplinary data access in the Solid Earth Domain

Daniele Bailo, Jan Michalek, Keith G Jeffery, Kuvvet Atakan, and Rossana Paciello and the EPOS IT Team

The European Plate Observing System (EPOS) addresses the problem of homogeneous access to heterogeneous digital assets in geoscience of the European tectonic plate. Such access opens new research opportunities. Previous attempts have been limited in scope and required much human intervention. EPOS adopts an advanced Information and Communication Technologies (ICT) architecture driven by a catalogue of rich metadata. The architecture of the EPOS system together with challenges and solutions adopted are presented. The EPOS Data Portal is introducing a new way for cross-disciplinary research. The multidisciplinary research is raising new requirements both to students and teachers. The EPOS portal can be used either to explore the available datasets or to facilitate the research itself. It can be very instructive in teaching as well by demonstrating scientific use cases. 

EPOS ERIC had been established in 2018 as European Research Infrastructure Consortium for building a pan-European infrastructure and accessing solid Earth science data. The sustainability phase of the EPOS (EPOS-SP – EU Horison2020 – InfraDev Programme – Project no. 871121; 2020-2022) is focusing on finding solutions for the long-term sustainability of EPOS developments. The ambitious plan of geoscientific data integration started already in 2002 with a Conception Phase and continued by an EPOS-PP (Preparatory Phase, 2010-2014) where about 20 partners joined the project. The finished EPOS-IP project (EPOS-IP – EU Horison2020 – InfraDev Programme – Project no. 676564; 2015-2019) included 47 partners plus 6 associate partners from 25 countries from all over Europe and several international organizations.

The EPOS Data Portal provides access to data and data products from ten different geoscientific areas: Seismology, Near Fault Observatories, GNSS Data and Products, Volcano Observations, Satellite Data, Geomagnetic Observations, Anthropogenic Hazards, Geological Information and Modelling, Multi-scale laboratories and Tsunami Research. The Data portal Graphic User Interface (GUI) provides search functionalities to enable users to filter data by using several criteria (e.g. spatio-temporal extents, keywords, data/service providers, free-text); also, it enables users to pre-visualize data in Map, Tabular or Graph formats; the GUI finally provides details about the selected data (e.g., name, description, license, DOI), as well as to further refine the search in order to dig into a smaller level of granularity of data.

The presentation is showing achievements of the EPOS community with focus on the EPOS Data Portal which is providing information about available datasets from TCS and access to them. We are demonstrating not only features of the graphical user interface but also the underlying architecture of the whole system.

How to cite: Bailo, D., Michalek, J., Jeffery, K. G., Atakan, K., and Paciello, R. and the EPOS IT Team: EPOS Data portal for cross-disciplinary data access in the Solid Earth Domain, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4265, https://doi.org/10.5194/egusphere-egu22-4265, 2022.

EGU22-8905 | Presentations | ESSI2.3

Developing a Next Generation Platform for Geodetic, Seismological and Other Geophysical Data Sets and Services

Chad Trabant, Henry Berglund, Jerry Carter, and David Mencin

The Data Services of IRIS and the Geodetic Data Services of UNAVCO have been supporting the seismological and geodetic research communities for many decades.  Historically, these two facilities have independently managed data repositories on self-managed systems.  As part of merger activities between IRIS and UNAVCO, we have established a project to design, develop and implement a common, cloud-based platform.  Goals of this project include operational improvements such as cost-effectiveness, robustness, on-demand scalability, significant growth potential and increased adaptability for new data types.  While we expect a number of operational improvements, we anticipate a number of additional benefits for the research communities we serve.

The new platform will provide services for data queries across the internal repositories.  This will provide researchers with an easier path to discovery, and access to integratable data sets of related geophysical data.

Researchers will be able to conduct their data processing in the same, or data-proximate, cloud as the platform, taking advantage of copious and affordable computation offered by such environments.  Following the paradigm of moving the computation to the data, this will avoid the time and resource consuming need to transfer the data over the internet.  Furthermore, the adoption of cloud-optimized data containers and direct access by researchers will support efficient processing.  In cases where transferring large volumes of data are still necessary, the large capacity of cloud storage systems will allow enhanced mechanisms such as Globus for transfer, which we will be exploring.

For many users a transition of the data repositories to a new environment will be nearly seamless.  This will be made possible by implementing many of the same services already supported by the current facilities, such as the suite of FDSN web services.  The project is currently in a prototyping stage, and we anticipate having a complete design by the end of 2022.  We will report on the status of the project, anticipated directions and challenges identified so far.

How to cite: Trabant, C., Berglund, H., Carter, J., and Mencin, D.: Developing a Next Generation Platform for Geodetic, Seismological and Other Geophysical Data Sets and Services, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8905, https://doi.org/10.5194/egusphere-egu22-8905, 2022.

EGU22-9421 | Presentations | ESSI2.3

Breaking the barriers to interdisciplinarity: Contributions from the Environmental Research Infrastructures

Angeliki K. Adamaki, Ana Rita Gomes, Alex Vermeulen, Ari Asmi, and Andreas Petzold

As science and technology evolve, interdisciplinary targets are anything but static, introducing additional levels of complexity and challenging further the initiatives to break the barriers to interdisciplinary research. For over a decade the community of the Environmental Research Infrastructures, forming the ENVRI cluster, has been building strong foundations to overcome these challenges and benefit the environmental sciences. One of the overarching goals of the ENVRI cluster is to provide more FAIR (Findable, Accessible, Interoperable and Reusable) data and services which will be open to everyone who wishes to get access to environmental observations, from scientists and research communities of scientifically diverse clusters to curious citizens, data scientists and policy makers.

Starting with domain-specific use cases we further explore potential cross-domain cases, e.g. in the form of environmental science stories crossing disciplinary boundaries. A set of Jupyter Notebooks developed by the contributing Research Infrastructures (and accessible from a hub of services called the ENVRI-Hub) are promising tools to demonstrate and validate the capabilities of service provision among ENVRIs and across Science Clusters, and act as examples of what a user can achieve through the ENVRI-Hub. In one of the examples we investigate, a user-friendly well-structured Jupyter Notebook that makes use of research infrastructures’ application programming interfaces (APIs) jointly plots in a map the geographical locations of several Marine and Atmospheric stations (where the stations in this example are defined as measurement points actively collecting data). The FAIR principles provide a firm foundation defining the layer that supports the ENVRI-Hub structure and the preliminary results are promising. Considering that the APIs can become discoverable via a common ENVRI catalogue, the ENVRI-Hub aims to make full use of the machine-actionability of such a catalogue in the future to facilitate this kind of use case execution in the Hub itself.

Acknowledgement: ENVRI-FAIR has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824068. This work is only possible with the collaboration of the ENVRI-FAIR partners and thanks to the joint efforts of the whole ENVRI team.

How to cite: Adamaki, A. K., Gomes, A. R., Vermeulen, A., Asmi, A., and Petzold, A.: Breaking the barriers to interdisciplinarity: Contributions from the Environmental Research Infrastructures, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9421, https://doi.org/10.5194/egusphere-egu22-9421, 2022.

EGU22-8862 | Presentations | ESSI2.3

ENVRI-Hub, the open-access platform of the environmental sciences community in Europe: a closer look into the architecture

Ana Rita Gomes, Angeliki Adamaki, Alex Vermeulen, Ulrich Bundke, and Andreas Petzold

The ENVRI-FAIR project brings together the ESFRI environmental research infrastructures (ENVRI) that provide environmental data and services, with the aim of making their resources compliant to the FAIR principles. To achieve this goal , the required work is mostly technical, with the ENVRIs working towards not only improving the FAIRness of their own data and services, but also reflecting their efforts at a higher level by becoming FAIR as a cluster. The approach of this task cannot be linear as it requires harmonization of efforts at different dimensions. To build on a common ground, the most crucial technical gaps have been prioritized and the ENVRIs identify common requirements and design patterns, and collaborate on making good use of existing technical solutions that improve their FAIRness.

 

One of the highest ranked priorities, and obviously among the biggest challenges, is the design of a machine actionable ENVRI Catalogue of Services that also supports the integration into the EOSC. Through this catalogue the service providers will be able to make their assets findable and accessible by mapping their resources into common and rich metadata standards, while by means of a web application the human interaction with the FAIR services can be accomplished. The design of this application, named the ENVRI-Hub, is discussed here. Other aspects related to the ENVRI services, e.g. the use of PIDs, the use of relevant vocabularies, tracking license information and provenance etc. are also investigated.

 

Considering the ENVRI-Hub as a web application, this can act as an integrator by bringing together already existing ENVRI services and interoperable services across research infrastructure boundaries . Exploring the potentials of the ENVRI-Hub already from the design phase, the ingestion of metadata from ENVRI assets such as the ENVRI Knowledge Base, the ENVRI Catalogue of Services and the ENVRI Training Catalogue is investigated, aiming to provide the users with functionalities that are relevant to e.g. the discovery of environmental observations, services, tutorials and other available resources. The chosen architectural pattern for the development of the ENVRI-Hub can be compared to a classical n-tier architecture, comprising 1) a data tier, 2) a logic tier and 3) a presentation tier. To integrate the different ENVRI platforms while preserving the application’s independence, the ENVRI-Hub demonstrator aims to replicate an instance of the Knowledge Base and Catalogue of Services. Following a centralised architectural approach, the ENVRI-Hub serves as a harvester entity, collecting data and metadata from the ENVRI Knowledge Base and the ENVRI Catalogue of Services, therefore bringing together these ENVRI platforms into one single portal.

 

Acknowledgement: ENVRI-FAIR has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824068.

This work is only possible with the collaboration of the ENVRI-FAIR partners and thanks to the joint efforts of the whole ENVRI-Hub team.

How to cite: Gomes, A. R., Adamaki, A., Vermeulen, A., Bundke, U., and Petzold, A.: ENVRI-Hub, the open-access platform of the environmental sciences community in Europe: a closer look into the architecture, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8862, https://doi.org/10.5194/egusphere-egu22-8862, 2022.

EGU22-3261 | Presentations | ESSI2.3

Reimagining the AuScope Virtual Research Environment Through Human-Centred Design

Jens Klump, Ulrich Engelke, Vincent Fazio, Pavel Golodoniuc, Lesley Wyborn, and Tim Rawling

AuScope, founded in 2006, is the provider of research infrastructure to Australia’s  Earth and geospatial science community. Its unifying strategic goals include building the Downward Looking Telescope (DLT) (a metaphor for an integrated system of Earth and geospatial instruments, services, data and analytics to enable scientists to understand Earth’s evolution through time) and exploring how Earth resources may support growing human demands. The AuScope Virtual Research Environment (AVRE) program is responsible for enabling the DLT through providing persistent access to required data and tools from a diverse range of Australian research organisations, government geological surveys and the international community.

In 2009 AuScope released a portal to provide online access to evolved data products to specific groups of users. Subsequently, this portal was combined with online tools to create the AVRE platform of specialised Virtual Laboratories that enabled the execution of explicit workflows. By 2021 it was recognised that AVRE should modernise and take advantage of new technologies that could empower researchers to access higher storage capacities and wider varieties of computational processing options. AVRE also needed to leverage notebooks, containerisation and mobile solutions and facilitate a greater emphasis on ML and AI techniques. Increased storage meant researchers could access less processed, rawer forms of data, which they could then prepare for their own specific requirements, whilst the growth in Open Source software meant easy access to tools that could meet or efficiently be adapted to their needs. 

Recognising that AuScope researchers now required new mechanisms to help them find and reuse multiple resources from globally distributed sites and be able to integrate these with their own data types and tools, the AVRE informatics and technology experts began assessing the requirements for modernising the AVRE platform. The technologists reviewed other virtual research environments, research data portals, and e-commerce platforms for examples of well-designed interfaces and services that help users get the best use out of a platform. 

We then undertook a series of interactive consultations across a broad range of AuScope researchers (geophysics, geochemistry, geospatial, geology, etc). We accepted there were multiple requirements, from simple data processing on small volume data sets through to complex data modelling and assimilation at petascale, and openly acknowledged that there were numerous ways of processing: one size would not fit all.

In the consultations, we focussed on the context that AVRE was about enabling researchers to use a diversity of resources to realise the AuScope strategic goal of the DLT. We recognised that this would require an ability to meet the specialised requirements of a broad range of the current individual AuScope geoscience programs, but at the same time, there was a need to allow for future integration with global transdisciplinary challenges that explore how Earth resources may support growing human demands.

In this presentation, we will discuss the outcomes from our consultations with various AuScope Programs and will present initial plans for a co-designed, re-engineered AVRE platform to meet the expressed needs of a diverse range of DLT developers and users.

How to cite: Klump, J., Engelke, U., Fazio, V., Golodoniuc, P., Wyborn, L., and Rawling, T.: Reimagining the AuScope Virtual Research Environment Through Human-Centred Design, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3261, https://doi.org/10.5194/egusphere-egu22-3261, 2022.

EGU22-5407 | Presentations | ESSI2.3

Real-time Delivery of Sensor Data Streams using IoT and OGC Standards

Simon Jirka, Christian Autermann, and Sebastian Drost

In the past, many projects have evaluated and demonstrated the use of the Sensor Web Enablement (SWE) Standards of the Open Geospatial Consortium (OGC) in order to publish sensor data. Advantages of these standards included the provision of a domain-independent approach for ensuring interoperability of interfaces, data, and metadata. However, in most cases, the developed infrastructures were limited to pull-based data retrieval patterns. This means that data consumers regularly query servers for data updates which may result in high server loads due to a high-frequency of update requests or increased latencies until a consumer receives new sensor data.


Although there were relevant specifications such as the OGC Publish/Subscribe standard as well as discussion papers, the OGC SWE framework never included a widely accepted solution to handle an active push-based delivery of observation data. With the adaptation of the SensorThings API standard of the OGC in conjunction with mainstream Internet of Things protocols such as the Message Queuing Telemetry Transport (MQTT) protocol this has changed in recent years.


In 2020 we have already presented at the EGU an approach on how to use these technologies to enable the efficient collection of sensor observation data in hydrological application by bridging between sensors and data management servers (Drost et al., 2020).


As part of this contribution, we will discuss the applicability of these technologies, OGC SensorThings API as well as MQTT, to also cover the delivery of data to consumers in addition to the previously described data transmission from sensor devices to a data sink. We will put special emphasis on experiences gathered from the deployment in marine environments (e.g., live underway data and event metadata streams of research vessels), as part of the EMODnet Ingestion II project. Special consideration will be given to a discussion of potential advantages of push-based communication patterns as well as identified challenges for future work (e.g., metadata about push-based data streams, standardization of payloads, access control, best practices on how to structure provided data streams).


Furthermore, we will address the development of data visualization tools for such interoperable real-time data streams and will discuss the opportunities to transfer these technologies to further application domains such as hydrology.


References

Drost, S., Speckamp, J., Hollmann, C., Malewski, C., Rieke, J., & Jirka, S. (2020). Internet of Things Technologies for the Efficient Collection of Hydrological Measurement Data. EGU General Assembly 2020, Online. https://doi.org/10.5194/egusphere-egu2020-10452

How to cite: Jirka, S., Autermann, C., and Drost, S.: Real-time Delivery of Sensor Data Streams using IoT and OGC Standards, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5407, https://doi.org/10.5194/egusphere-egu22-5407, 2022.

EGU22-8537 | Presentations | ESSI2.3

INSITUDE: web-based collaborative platform for centralized oceanographic data reception, management, exploration, and analysis

Dmitry Khvorostyanov, Victor Champonnois, Alain Laupin-Vinatier, Jacqueline Boutin, Gilles Reverdin, Nathalie Lefèvre, Antonio Lourenco, Alban Lazar, Jean-Benoit Charrassin, and Frédéric Vivier

LOCEAN laboratory of the Pierre and Simon Laplace Insitute (IPSL) is in charge of a number of scientific projects and measurement campaigns that result in a large flow of heterogeneous oceanographic data managed at LOCEAN. The data are of various origins and include in situ data from buoys, ships, moorings, marine mammals and satellite missions for salinity, altimetry, ocean color, temperature. LOCEAN also has an instrumental development team that designs and deploys buoys in various parts of the global ocean, with a need to receive and track the data in the near-real time. The data PIs can be involved in different research groups and projects, and while focusing on providing their data, they might need to collaborate with other teams providing complementary datasets.

To address these needs, the INSITUDE platform is developed at LOCEAN with these goals in mind: (1) receive, manage, track in the near-real time, and explore diverse data; (2) assist scientific experts in the data quality control; (3) facilitate cross-uses of insitu and satellite data available at LOCEAN.

The software consists of four components: (1) Django application for the meta-data management; (2) Data processing software (Python); (3) Flask application for server-side interactions with the database; (4) Interactive data exploration/validation front-end.

The basic workflow involves the following steps:

(1) The user specifies the relevant meta-data using the web interface of the Django application; the meta-data database is thus updated;

(2) The processing core is launched automatically at regular times during a day: it reads the meta-data from the database, queries the mailboxes and/or external web services for the data requested, receives, decodes and processes the data, and fills the measurements database. It also generates ASCII data files for selected datasets, which can be downloadable via dedicated web pages or can be used for processing with external user programs (e.g. matlab or python scripts);

(3) The data stored in the measurements database can be interactively explored using DataViewer applications, allowing zoomable views of time series, vertical profiles, and trajectories shown on the virtual globe. Data from different campaigns and for different variables can be viewed together. The quality control assistant allows experts to seamlessly validate the data by assigning quality flags to selected data points or regions, optionally after computing relevant statistics. The validated data can then be visualized and saved based on desired quality flag values.

The INSITUDE platform facilitates data sharing across multiple teams and collaborations between data providers and data experts, researchers and engineers, enabling research projects focused on cross-exploration of various datasets, studies of processes involving both in situ and satellite data, and interpretation of in situ data in a larger-scale context owing to the satellite data. The system offers centralized intuitive acquisition control and access to the data received, along with the related meta-data (projects, campaigns, buoys, people, etc.), facilitates data quality control/validation.

The INSITUDE platform is currently used at LOCEAN and can be deployed in data centers of national data infrastructures, such as the French ODATIS/DATA TERRA.

How to cite: Khvorostyanov, D., Champonnois, V., Laupin-Vinatier, A., Boutin, J., Reverdin, G., Lefèvre, N., Lourenco, A., Lazar, A., Charrassin, J.-B., and Vivier, F.: INSITUDE: web-based collaborative platform for centralized oceanographic data reception, management, exploration, and analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8537, https://doi.org/10.5194/egusphere-egu22-8537, 2022.

ESSI2.7 – Meeting Exascale Computing Challenges with Compression and Pangeo

EGU22-5709 | Presentations | ESSI2.7

Pangeo for everyone with Galaxy

Anne Fouilloux, Yvan Le Bras, and Adele Zaini

Pangeo has been deployed on a number of diverse infrastructures and learning resources are available with for instance the Pangeo Tutorial Gallery (http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/index.html). However, knowledge of Python is necessary to develop or reuse applications with the Pangeo ecosystem which hinders its wider adoption and reduces potential inter-disciplinary collaborations. 

Our main objective is to reduce barriers for using the Pangeo ecosystem and allow everyone to understand the fundamental concepts behind Pangeo and offer a Pangeo deployment for teaching and developing reproducible, reusable and fully automated workflows.

Most Pangeo tutorials and examples use Jupyter notebooks but the gap between these “toy examples” and real complex applications is still huge: adopting best software practices for Jupyter notebooks and big applications is essential for reuse and automation of workflows.

Galaxy project is a worldwide community dedicated to making tools, workflows and infrastructures open and accessible to everyone. Each tool in Galaxy has a wrapper describing the tool itself along with the input and output parameters, citations, and possible annotations thanks to EDAM ontology. Galaxy workflows are also annotated and can contain any kind of Galaxy Tools, including interactive tools such as Pangeo notebooks. 

Galaxy is also accessible via a web-based interface. The platform is designed to be community and technology agnostic and has gained adoption in various communities, ranging from Climate Science and Biodiversity to Biology and Medicine. 

By combining Pangeo and Galaxy, we are providing access to the Pangeo ecosystem to everyone, including those who are not familiar with Python and we offer fully automated and annotated Pangeo “tools”. 

Two main set of tools are currently available in Galaxy:

  • Pangeo notebook (synced with Pangeo notebook with corresponding docker https://github.com/pangeo-data/pangeo-docker-images) 
  • Xarray tools to manipulate and visualise netCDF data from Galaxy Graphical User Interface.

Training material is being developed and  included in the Galaxy Training Network (https://training.galaxyproject.org/):

  • “Pangeo ecosystem 101 for everyone - Introduction to Xarray Galaxy Tools” where anyone can learn about Pangeo and its main concepts and try it out without using any command lines;
  • Pangeo Notebook in Galaxy - Introduction to Xarray:itl is very similar to “Xarray Tutorial” from Pangeo (http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/xarray.htm) but makes use of Galaxy Pangeo notebooks and offers a different entry point to Pangeo.

Galaxy Training Infrastructure as a Service (https://galaxyproject.eu/tiaas.html) with infrastructure at no cost is provided by Galaxy Europe for  teachers/instructors. It was used for the FORCeS eScience course “Tools in Climate Science: Linking Observations with Modeling” (https://galaxyproject.eu/posts/2021/11/13/tiaas-anne/) where about 30 students learned about Pangeo (see https://nordicesmhub.github.io/forces-2021/intro.html).

Galaxy Pangeo also contributes to the worldwide online training “GTN Smörgåsbord” (last event 14-18 March 2022, https://gallantries.github.io/posts/2021/12/14/smorgasbord2-tapas/) where everyone is welcome as a trainee, trainer or just observer! This will contribute to democratising Pangeo.

How to cite: Fouilloux, A., Le Bras, Y., and Zaini, A.: Pangeo for everyone with Galaxy, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5709, https://doi.org/10.5194/egusphere-egu22-5709, 2022.

EGU22-3739 | Presentations | ESSI2.7

Environmental Data Science Book: a community-driven resource showcasing open-source Environmental science

Alejandro Coca-Castro, Scott Hosking, and The Environmental Data Science Community

With the plethora of open data and computational resources available, environmental data science research and applications have accelerated rapidly. Therefore, there is an opportunity for community-driven initiatives compiling and classifying open-source research and applications across environmental systems (polar, oceans, forests, agriculture, etc). Building upon the Pangeo Gallery, we propose The Environmental Data Science book (https://the-environmental-ds-book.netlify.app), a community-driven online resource showcasing and supporting the publication of data, research and open-source developments in environmental sciences. The target audience and early adopters are i) anyone interested in open-source tools for environmental science; and ii) anyone interested in reproducibility, inclusive, shareable and collaborative AI and data science for environmental applications. Following FAIR principles, the resource provides multiple features such as guidelines, templates, persistent URLs and Binder to facilitate a fully documented, shareable and reproducible notebooks. The quality of the published content is ensured by a transparent reviewing process supported by GitHub related technologies. To date, the community has successfully published five python-based notebooks: two forest-, two wildfires/savanna- and one polar-related research. The notebooks consume common Pangeo stack e.g. intake, iris, xarray, hvplot for interactive visualisation and modelling from Environmental sensor data. In addition to constant feature enhancements of the GitHub repository https://github.com/alan-turing-institute/environmental-ds-book, we expect to increase inclusivity (multiple languages), diversity (multiple backgrounds) and activity (collaboration and coworking sessions) towards improving scientific software practises in the environmental science community.

How to cite: Coca-Castro, A., Hosking, S., and Community, T. E. D. S.: Environmental Data Science Book: a community-driven resource showcasing open-source Environmental science, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3739, https://doi.org/10.5194/egusphere-egu22-3739, 2022.

EGU22-13028 | Presentations | ESSI2.7

Using a Pangeo platform on Azure to tackle global environmental challenges 

Timothy Lam, Alberto Arribas, Gavin Shaddick, Theo McCaie, and Jennifer Catto

The Pangeo project enables interactive, reproducible and scalable environmental research to be carried out using an integrated data-computational platform. Here we demonstrate a few examples that utilise a Pangeo platform on Microsoft Azure supported by the Met Office where global environmental challenges are explored and tackled collaboratively. They include: (1) Analysing and quantifying drivers of low rainfall anomalies during boreal summer in Indonesian Borneo using causal inference and causal network to identify key teleconnections, and their possible changes under a warming climate, which will contribute to seasonal forecasting efforts to strengthen prevention and control of drought and fire multihazards over peatlands in the study region; (2) Quantifying and communicating uncertainty in volcanic ash forecasts; and (3) Exploring the cascading effects that follow the degradation and recovery of Caribbean coral reefs.

How to cite: Lam, T., Arribas, A., Shaddick, G., McCaie, T., and Catto, J.: Using a Pangeo platform on Azure to tackle global environmental challenges , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13028, https://doi.org/10.5194/egusphere-egu22-13028, 2022.

EGU22-11729 | Presentations | ESSI2.7

Intercomparison of basin-to-global scale submesoscale-permitting ocean models at SWOT cross-overs 

Julien Le Sommer and Takaya Uchida and the SWOT Adopt-A-Crossover Ocean Model Intercomparison Project Team

With an increase in computational power, ocean models with kilometer-scale resolution have emerged over the last decade. Using these realistic simulations, we have been able to quantify the energetic exchanges between spatial scales and inform the design of eddy parametrizations. The increase in resolution, however, has drastically increased model outputs, making it difficult to transfer and analyze the data. The realism of individual models in representing the energetics down to numerical dissipation has also come into question. Here, we showcase a cloud-based analysis framework proposed by the Pangeo Project that aims to tackle such distribution and analysis challenges. We analyze seven submesoscale permitting simulations all on the cloud at a crossover region of the upcoming SWOT altimeter mission near the Gulf Stream separation. The models used in this study are based on the NEMO, CROCO, MITgcm, HYCOM, FESOM and FIO-COM code bases. The cloud-based analysis framework: i) minimizes the cost of duplicating and storing ghost copies of data, and ii) allows for seamless sharing of analysis results amongst collaborators. In this poster, we will describe the framework and provide preliminary results (e.g. spectra, vertical buoyancy flux, and how it compares to predictions from the mixed-layer instability parametrization). Basin-to-global scale, submesoscale-permitting models are still at their early stage of development ; their cost and carbon footprints are also rather large. It would, therefore, benefit the community to compile the different model configurations for future best practices. We also believe that an emphasis on data analysis strategies would be crucial for improving the models themselves.



How to cite: Le Sommer, J. and Uchida, T. and the SWOT Adopt-A-Crossover Ocean Model Intercomparison Project Team: Intercomparison of basin-to-global scale submesoscale-permitting ocean models at SWOT cross-overs , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11729, https://doi.org/10.5194/egusphere-egu22-11729, 2022.

EGU22-7593 | Presentations | ESSI2.7

How to turn satellite data to insights at scale

Basile Goussard

NetCarbon, a brand new french startup company, is offering farmers a free solution for measuring and monetizing their sequestered carbon to contribute towards carbon neutrality. This solution is relying on satellite data (Sentinel 2, Landsat 8 & PlanetScope) and open-source ecosystems such as the Pangeo software stack

 

The challenge in NetCarbon’s solution is the deployment of earth observation insights at scale. And be able to shift between cloud providers or on-premise architecture if needed. The best tool for us is up-to-now PANGEO.  

 

An example of our pangeo usage will be shown in the following three points.   

1°) Connection to satellite data / Extract 

2°) Processing satellite data at scale / Transform

3°) Saving the data within a data warehouse / Load

 

First, some of the building blocks to search for satellite data based on STAC will be shown. Moreover, the stackstac package will be tested to convert STAC into xarray, allowing researchers and companies to create their datacubes with all the metadata inside. 

 

The second part of the presentation will involve the computation layer. Indeed, computations algorithms like filtering by cloud cover, applying cloud mask, computing the land surface temperature, and applying an interpolation will be run. Land surface temperature is one data needed for the NetCarbon algorithm. The result of these previous steps will lead us to retrieve a dask computation graph. This computation graph will be run at scale within the cloud, based on Dask and Coiled. 

 

To conclude, the output of the processing part (spatial and temporal mean of the land surface temperature) will be displayed within a notebook and finally, the data will be loaded into a data warehouse (google bigquery). 

 

All the steps will be demonstrated in a reproducible notebook

How to cite: Goussard, B.: How to turn satellite data to insights at scale, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7593, https://doi.org/10.5194/egusphere-egu22-7593, 2022.

EGU22-4566 | Presentations | ESSI2.7

Is there a correlation between the cloud phase and surface snowfall rate in GCMs?

Franziska Hellmuth, Anne Claire Mireille Fouilloux, Trude Storelvmo, and Anne Sophie Daloz

Cloud feedbacks are a major contributor to the spread of climate sensitivity in global climate models (GCMs) [1]. Among the most poorly understood cloud feedbacks is the one associated with the cloud phase, which is expected to be modified with climate change [2]. Cloud phase bias, in addition, has significant implications for the simulation of radiative properties and glacier and ice sheet mass balances in climate models.  

In this context, this work aims to expand our knowledge on how the representation of the cloud phase affects snow formation in GCMs. Better understanding this aspect is necessary to develop climate models further and improve future climate predictions. 

This study will compare surface snowfall, ice, and liquid water content from the Coupled Model Intercomparison Project Phase 6 (CMIP 6) climate models (accessed through Pangeo) to the European Centre for Medium-Range Weather Forecast Re-Analysis 5 (ERA5) data from 1985 to 2014. We conduct statistical analysis at the annual and seasonal timescales to determine the biases in cloud phase and precipitation (liquid and solid) in the CMIP6 models and their potential connection between them. 

For the analysis, we use the Jupyter notebook on the CMIP6 analysis (https://github.com/franzihe/eosc-nordic-climate-demonstrator/blob/master/work/), which guides the user step by step. The use of the Pangeo.io intake package makes it possible to browse the CMIP6 online catalog for the required variables, models, and experiments and stores it in xarray dask datasets. Vertical variables in sigma pressure levels had to be interpolated to standard pressure levels as provided in ERA5. We also interpolated the horizontal and vertical variables to the exact horizontal grid resolution before calculating the climatology. 

A global comparison between the reanalysis (ERA5) and the CMIP6 models shows that models tend to underestimate the ice water path compared to the reanalysis even if most of them can reproduce some of the characteristics of liquid water content and snowfall. To better understand the link between biases in cloud phase and surface snowfall rate, we try to find a relationship between ice water path and surface snowfall in GCMs. Linear regressions within extratropical areas show a positive relationship between ice water content and surface snowfall in the reanalysis data, while CMIP6 models do not have these characteristics. 

  

[1] Zelinka, M. D., Myers, T. A., McCoy, D. T., Po-Chedley, S., Caldwell, P. M., Ceppi, P., et al. (2020). Causes of higher climate sensitivity in CMIP6 models. Geophysical Research Letters, 47, e2019GL085782. https://doi-org.ezproxy.uio.no/10.1029/2019GL085782 

[2] Bjordal, J., Storelvmo, T., Alterskjær, K. et al. Equilibrium climate sensitivity above 5 °C plausible due to state-dependent cloud feedback. Nat. Geosci. 13, 718–721 (2020). https://doi-org.ezproxy.uio.no/10.1038/s41561-020-00649-1 

 

Github: https://github.com/franzihe 

How to cite: Hellmuth, F., Fouilloux, A. C. M., Storelvmo, T., and Daloz, A. S.: Is there a correlation between the cloud phase and surface snowfall rate in GCMs?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4566, https://doi.org/10.5194/egusphere-egu22-4566, 2022.

EGU22-2746 | Presentations | ESSI2.7

Scalable Offshore Wind Analysis With Pangeo

Derek O'Callaghan and Sheila McBreen

The expansion of renewable energy portfolios to utilise offshore wind resources is a key objective of energy policies focused on the generation of low carbon electricity. Wind atlases have been developed to provide energy resources maps, containing information on wind speeds and related variables at multiple heights above sea level for offshore regions of interest (ROIs). However, these atlases are often associated with legacy projects, where access to corresponding data products may be restricted preventing further development by third parties. Reliable, long-term observations are crucial inputs to the offshore wind farm area assessment process, with observations typically measured close to the ocean surface using in situ meteorological masts. Remote sensing techniques have been proposed to address resolution and coverage issues associated with in situ measurements, in particular, the use of space-borne Earth Observation (EO) instruments for ocean and sea surface wind estimations. In recent years, a variety of initiatives have emerged that provide public access to wind speed data products, which have potential for application in wind atlas development and offshore wind farm assessment. Combining products from multiple data providers is challenging due to differences in spatial and temporal resolution, product access, and product formats. In particular, the associated large dataset sizes are significant obstacles to data retrieval, storage, and subsequent computation. The traditional process of retrieval and local analysis of a relatively small number of ROI products is not readily scalable to accommodate longitudinal studies of multiple ROIs. 

This work presents a case study that demonstrates the utility of the Pangeo software ecosystem to address these issues in the development of offshore wind speed and power density estimations, increasing wind measurement coverage of offshore renewable energy assessment areas in the Irish Continental Shelf region. The Intake library is used to manage a new data catalog created for this region, consisting of a collection of analysis-ready, cloud-optimized (ARCO) datasets generated using the Zarr format. This ARCO catalog features up to 21 years of available in situ, reanalysis, and satellite observation data products. The xarray and Dask libraries enable scalable catalog processing, including analysis of provided data variables and derivation of new variables as required for candidate wind farm ROIs, avoiding redundant storage and processing requirements for regions not under assessment. Individual catalog datasets have been regridded to relevant spatial grids, or appropriately chunked in time and space, by means of the xESMF and Rechunker libraries respectively. A set of Jupyter notebooks has been created to demonstrate catalog visualization and processing, following the conventions of notebooks in the current Pangeo Gallery. These notebooks provide detailed descriptions of each ARCO dataset, along with an evaluation of wind speed extrapolation and power density estimation methods. The employment of new approaches such as Pangeo Forge for future catalog and dataset creation is also explored. This case study has determined that the Pangeo ecosystem approach is extremely beneficial in the development of open architectures operating on large volumes of disparate data, while also contributing to the objectives of scientific code sharing and reproducibility.

How to cite: O'Callaghan, D. and McBreen, S.: Scalable Offshore Wind Analysis With Pangeo, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2746, https://doi.org/10.5194/egusphere-egu22-2746, 2022.

EGU22-13556 | Presentations | ESSI2.7

Atmospheric Retrievals in a Modern Python Framework

Mario Echeverri Bautista, Maximilian Maahn, Anton Verhoef, and Ad Stoffelen

Modern Machine Learning (ML) techniques applied in atmospherical modeling rely heavily on two
aspects: good quality and good coverage observations. Among others, Satellite
Radiometer (SR) measuremens (Radiances or Brightness Temperatures) offer an excellent trade off
between such aspects; moreover SR observations have been providing quite stable Fundamental Cli-
mate Data Records (FCDR) for years and are expected to continue to do so in the following decades.
This work presents a framework for SR retrievals that uses modern ML standard packages from
the Scipy and Pangeo ecosystems; moreover, our retrieval scheme leverage the powerful
capabilites provided by NWPSAF’s RTTOV and its Python wrapper.
In terms of retrievals we stand on the shoulders of Bayesian Estimation by using Optimal Estima-
tion (OE), popularized by Rodgers for 1D atmospherical retrievals; we use pyOpEst
which is an open source package developed by Maahn. PyOptimalEstimation is structured
following an Object Oriented design, which makes it portable and highly maintainable.

The contribution presented here ranges from the scientific software design aspects, algorithmic
choices, open source contributions, processing speed and scalability; furthermore, simple but effi-
cient techniques such as cross-validation were used to evaluate different metrics; for initial test-
ing we have used NWPSAF’s model data and observation error covariances from SR literature.

The open source and community development philosophy are two pillars of this work. Open source
allows a transparent, concurrent and continuous development while community development brings
together domain experts, software developers and scientists in general; these two ideas allow us to
both profit from already developed and well supported tools (e.g. Scipy and Pangeo) and contribute
for others whose applications might benefit. This methodology has been successfully used all over the
Data Science and ML universe and we believe that the Earth Observation (EO) community would highly benefit in terms of streamlining development and benchmarking of new solutions. Practical examples of success can be found in the Pytroll community.

Our work in progress is directly linked to present and near future requirements by Earth Observa-
tion, in particular the incoming SR streams of data (for operational purposes) is increasing fast
and by orders of magnitude. Missions like the EUMETSAT Polar System-Second Generation (EPS-
SG, 2023) or the Copernicus Microwave Imager Radiometer (CIMR, 2026) will require scalability
and flexibility from the tools to digest such flows of data. We will discuss and show how operational
tools can take advantage of the enormous community based developments and standards and become
game changers for EO.

How to cite: Echeverri Bautista, M., Maahn, M., Verhoef, A., and Stoffelen, A.: Atmospheric Retrievals in a Modern Python Framework, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13556, https://doi.org/10.5194/egusphere-egu22-13556, 2022.

EGU22-13193 | Presentations | ESSI2.7

CliMetLab and Pangeo use case: Machine learning data pipeline for sub-seasonal To seasonal prediction (S2S)

florian pinault, Aaron Spring, Frederic Vitart, and Baudouin Raoult

As machine learning algorithms are being used more and more prominently in the meteorology and climate domains, the need for reference datasets has been identified as a priority. Moreover, boilerplate code for data handling is ubiquitous in scientific experiments. In order to focus on science, climate/meteorology/data scientists need generic and reusable domain-specific tools. To achieve these goals, we used the plugin based CliMetLab python package along with many packages listed by Pangeo.  


Our use case consists in providing data for machine learning algorithms in the context of the sub-seasonal to seasonal (S2S) prediction challenge 2021. The data size is about 2 Terabytes of model predictions from three different models. We experimented with providing data in multiple formats: Grib, NetCDF, and Zarr. A Pangeo recipe (using the python package pangeo_forge_recipes) was used to generate Zarr data (relying heavily on xarray and dask for parallelisation). All three versions of the S2S data have been stored on an S3 bucket located on the ECMWF European Weather Cloud (ECMWF-EWC). 


CliMetLab aims at providing a simple interface to access climate and meteorological datasets, seamlessly downloading and caching data, converting to xarray datasets or panda dataframes, plotting data, feed them into machine learning frameworks such as tensorflow or pytorch. CliMetLab is open-source and still a Beta version (https://climetlab.readthedocs.io). The main target platform of CliMetLab is Jupyter notebooks. Additionally, a CliMetLab plugin allows shipping dataset-specific code along with a well-defined published dataset. Taking advantage of the CliMetLab tools to minimize the boilerplate code, a plugin has been developed for S2S data as a companion python package of the dataset.

How to cite: pinault, F., Spring, A., Vitart, F., and Raoult, B.: CliMetLab and Pangeo use case: Machine learning data pipeline for sub-seasonal To seasonal prediction (S2S), EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13193, https://doi.org/10.5194/egusphere-egu22-13193, 2022.

EGU22-9152 | Presentations | ESSI2.7

WOAST : an Xarray package applying Wavelet Scattering Transform to geophysical data

Edouard Gauvrit, Jean-Marc Delouis, Marie-Noëlle Bouin, and François Boulanger

Ocean plays a key role in regulating climate through the dynamical coupling between sea surface and the atmosphere. Understanding this coupling is a key issue in climate change modeling, but an adapted statistical representation is still lacking. A strong limitation comes from the non-Gaussianities existing inside a wind over waves surface layer, where wind flows are constrained by the sea state and the swell. We seek an approach to describe statistically the couplings across scales, which is poorly measured by the power spectrum. Recent developments in data science provide new tools as the Wavelet Scattering Transform (WST), which gives a low-variance statistical description of non-Gaussian processes and offers to go beyond the power spectrum representation. The latter is blind to position consistency between scales. To find the methodology, we applied the WST on 1D anemometer time series and 2D atmospheric simulations (LES) and compared them with well known statistical information. These analyses were made possible thanks to the development of WOAST (Wavelet Ocean-Atmosphere Scattering Transform) software. Computation of WST is mathematically embarrassingly parallel and the time consumption is mainly dominated by data access and memory management. Our preliminary geophysical analysis using WOAST and its efficiency of extracting unknown properties of intermittent processes will be shown through a jupyter notebook example. This work is part of the Astrocean project supported by 80Prime grants (CNRS).

How to cite: Gauvrit, E., Delouis, J.-M., Bouin, M.-N., and Boulanger, F.: WOAST : an Xarray package applying Wavelet Scattering Transform to geophysical data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9152, https://doi.org/10.5194/egusphere-egu22-9152, 2022.

EGU22-6350 | Presentations | ESSI2.7

OSDYN: a new python tool for the analysis of high-volume ocean outputs.

Valerie Garnier, Jean-Francois Le Roux, Justus Magin, Tina Odaka, Pierre Garreau, Martial Boutet, Stephane Raynaud, Claude Estournel, and Jonathan Beuvier

OSDYN (Observations and Simulations of the DYNamics) is a Python library that proposes diagnostics to explore the dynamics of the ocean and its interactions with the atmosphere and waves. Its main strengths are its genericity concerning the different types of netCDF files and its ability to handle large volumes of data.

Dedicated to large data sets such as in-situ, satellite, and numerical model observations, OSDYN is particularly powerful to manage different types of Arakawa-C grids and vertical coordinates (Nemo, Croco, Mars, Symphonie, WW3, MesoNH). Based on common Pangeo stack (xarray, dask, xgcm), OSDYN provides data readers that standardize the dimensions, coordinates, and variables names and properties of the datasets. Thus, all python diagnostics can be shared regardless of the model outputs.

Thanks to progress made using kerchunk and efforts on transforming metadata of Ifremer’s HPC center (auto-kerchunk), the reading of a large amount of netCDF files is fast and the selection of sub-domains or specific variables is almost immediate.

Jupyter notebooks will detail the implementation of three kinds of analyses. The first one focuses on climatologic issues. In order to compare modeled and satellite sea surface temperatures, the second one addresses spatial interpolation and comparison of data when some may be missing. Lastly, the third analysis provides an overview of how diagnostics describing the formation of deep water masses can be used from different data sets.

How to cite: Garnier, V., Le Roux, J.-F., Magin, J., Odaka, T., Garreau, P., Boutet, M., Raynaud, S., Estournel, C., and Beuvier, J.: OSDYN: a new python tool for the analysis of high-volume ocean outputs., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6350, https://doi.org/10.5194/egusphere-egu22-6350, 2022.

EGU22-11595 | Presentations | ESSI2.7 | Highlight

Pangeo for geolocating fish using biologging data

Justus Magin, Mathiew Woillez, Antoine Queric, and Tina odaka

In biologging, a small device attached to an animal is used to track its behaviour and environment. This data enables biologists to gain a better understanding of its movement, its preferred habitats, and the environmental conditions it needs to thrive, all of which is essential for the future protection of natural resources. For that, it is crucial to have georeferenced data of biological processes, such as fish migration, over a spatial and temporal range.

Since it is challenging to track fish directly in the water, models have been developed to geolocate fish from the high resolution temperature and pressure time series obtained from the data storage tag. In particular, reconstructing the trajectories of seabass using the temporal temperature changes obtained from biologging devices has been studied since 2010 (https://doi.org/10.1016/j.ecolmodel.2015.10.024). These fish tracks are computed based on the likelihood of the temperature data obtained from the fish tag and reference geoscience data such as satellite observations and ocean physics model output. A high temporal and spatial resolution of the reference data plays a key role in the quality of the fish trajectories. However, the size and accessibility of these data sets as well as the computing power required to process high resolution data remain technical barriers.

As the Pangeo ecosystem has been developed  to solve such challenges in geoscience, we can take advantage of it in biologging. We use libraries such as intake, kerchunk, and fsspec to quickly load the data, xarray, pint, and dask to compute and hvplot and jupyter to display the results. The pangeo software stack enables us to easily access the data and compute high resolution fish tracks in a scalable and interactive manner. 

How to cite: Magin, J., Woillez, M., Queric, A., and odaka, T.: Pangeo for geolocating fish using biologging data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11595, https://doi.org/10.5194/egusphere-egu22-11595, 2022.

EGU22-7610 | Presentations | ESSI2.7

Distributing your GPU array computation in Python

Jacob Tomlinson

There are many powerful libraries in the Python ecosystem for accelerating the computation of large arrays with GPUs. We have CuPy for GPU array computation, Dask for distributed computation, cuML for machine learning, Pytorch for deep learning and more. We will dig into how these libraries can be used together to accelerate geoscience workflows and how we are working with projects like Xarray to integrate these libraries with domain-specific tooling. Sgkit is already providing this for the field of genetics and we are excited to be working with community groups like Pangeo to bring this kind of tooling to the geosciences.

How to cite: Tomlinson, J.: Distributing your GPU array computation in Python, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7610, https://doi.org/10.5194/egusphere-egu22-7610, 2022.

EGU22-13542 | Presentations | ESSI2.7 | Highlight

Climate change adaptation digital twin to support decision making

Jenni Kontkanen, Pekka Manninen, Francisco Doblas-Reyes, Sami Niemelä, and Bjorn Stevens

Climate change will have far reaching impacts on human and natural systems during the 21st century. To increase the understanding of the present and future climate impacts and build resilience, improved Earth system modelling is required. The European Commission Destination Earth (DestinE) initiative aims to contribute to this by developing high precision digital twins (DTs) of the Earth. We present our solution to a climate-change adaptation DT, which is one of the two DTs developed during the first phase of DestinE. The objective of the climate change adaptation DT is to improve the assessment of the impacts of climate change and different adaptation actions at regional and national levels over multi-decadal timescales. This will be achieved by using two storm- and eddy-resolving global climate models, ICON (Icosahedral Nonhydrostatic Weather and Climate Model) and IFS (Integrated Forecasting System). The models will be run at a resolution of a few km on pre-exascale LUMI and MareNostrum5 supercomputers, which are flagship systems of the European High Performance Computing Joint Undertaking (EuroHPC JU) network. Following a radically different approach, climate simulations will be combined with a set of impact models, which enables assessing impacts on different sectors and topics, such as forestry, hydrology, cryosphere, energy, and urban areas. The end goal is to create a new type of climate simulations, in which user requirements are an integral part of the workflow, and thus adaptation solutions can be effectively deployed.

How to cite: Kontkanen, J., Manninen, P., Doblas-Reyes, F., Niemelä, S., and Stevens, B.: Climate change adaptation digital twin to support decision making, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13542, https://doi.org/10.5194/egusphere-egu22-13542, 2022.

EGU22-3285 | Presentations | ESSI2.7

Accelerating the Lagrangian particle tracking in hydrologic modeling at continental-scale

Chen Yang, Carl Ponder, Bei Wang, Hoang Tran, Jun Zhang, Jackson Swilley, Laura Condon, and Reed Maxwell

Unprecedented climate change and anthropogenic activities have induced increasing ecohydrological issues. Large-scale hydrologic modeling of water quantity is developing rapidly to seek solutions for those issues. Water-parcel transport (e.g., water age, water quality) is as important as water quantity to understand the changing water cycle. However, current scientific progress in water-parcel transport at large-scale is far behind that in water quantity. The known cause is the lack of powerful tools to handle observations and/or modeling of water-parcel transport at large-scale with high spatiotemporal resolutions. Lagrangian particle tracking based on integrated hydrologic modeling stands out among other methods because it accurately captures the water-parcel movements. Nonetheless, Lagrangian approach is computationally expensive, hindering its broad application in hydrologic modeling, particularly at large-scale. EcoSLIM, a grid-based particle tracking code, calculates water ages (e.g., evapotranspiration, outflow, and groundwater) and identifies source water composition (e.g., rainfall, snowmelt, and initial subsurface water), working seamlessly with the integrated hydrologic model ParFlow-CLM. EcoSLIM is written in Fortran and is originally parallelized by OpenMP (Open Multi-Processing) using shared CPU memory. As a result, we accelerate EcoSLIM by implementing it on distributed, multi-GPU platform using CUDA (Compute Unified Device Architecture) Fortran.

We decompose the modeling domain into subdomains. Each GPU is responsible for one subdomain. Particles moving out of a subdomain continue moving temporarily in halo grid-cells around the subdomain and then are transferred to the neighbor subdomains. Different transfer schemes are built to balance the simulation accuracy and the computing speed. Particle transfer leverages the CUDA-aware MPI (Message Passing Interface) to improve the parallel efficiency. Load imbalance among GPUs induced by irregular domain boundaries and heterogeneity of flow paths is observed. A load-balancing scheme, borrowed from Particle-In-Cell and modified based on the characteristics of EcoSLIM, is established. The simulation starts on a number of GPUs fewer than the total scheduled GPUs. The manager MPI process activates an idle GPU for a subdomain once the particle number on its current GPU(s) is over a specified threshold. Finally, all scheduled GPUs are enabled. Tests of the new code from catchment-scale (the Little Washita watershed), to regional-scale (the North China Plain), and then to continental-scale (the Continental US) using millions to billions of particles show significant speedup and great parallel performance. The parallelized EcoSLIM is a promising tool for the hydrologic community to accelerate our understanding of the terrestrial water cycle beyond the water balance in the changing world.

How to cite: Yang, C., Ponder, C., Wang, B., Tran, H., Zhang, J., Swilley, J., Condon, L., and Maxwell, R.: Accelerating the Lagrangian particle tracking in hydrologic modeling at continental-scale, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3285, https://doi.org/10.5194/egusphere-egu22-3285, 2022.

The increase in Earth System Models (ESMs) capabilities is strongly linked to the amount of computing power and data storage capacity available. The scientific community requires increased model resolution, large numbers of experiments and ensembles to quantify uncertainty, increased complexity of ESMs (including additional components), and longer simulation periods compared to the current state of climate models. HPC is currently undergoing a major change as is the next generation of computing systems (‘exascale systems’). These challenges cannot be met by mere extrapolation but require radical innovation in several computing technologies and numerical algorithms. Most applications targeting exascale machines require some degree of rewriting to expose more parallelism, and many face severe strong-scaling challenges if they are to effectively progress to exascale, as is demanded by their science goals. 

 

However, the performance evaluation of the new models through the exascale path will also increase its complexity. We do need new approaches to ensure that the computational evaluation of this new generation of models is done correctly. Moreover, this evaluation will help in the computational analysis during the model’s development and ensure the maximum throughput possible in the moment that operational configurations such as CMIP are run.

 

CPMIP metrics are a universal set of metrics easy to collect, which provide a new way to study ESMs from a computational point of view. Thanks to the H2020 project IS-ENES3, we had a unique opportunity to exploit this new set of metrics to create a novel database based on CMIP6 experiments, using the different models and platforms available all across Europe.

 

The results and analysis are presented here, where both differences and similarities among the models can be observed on a variety of different hardware. Moreover, the current database is presented for different studies, such as the comparison of different models running similar configurations or the same model and configuration but executed on different platforms. All these possibilities create a unique context that has to be exploited by the community to improve the evaluation of the computational performance of the ESMs, using this information for future optimizations and preparing our models for the new exascale platforms. Eventually, general prescriptions on how to disseminate the work done are given, and the need for the community to undertake the use of CPMIP metrics both on actual and new generation's platform is presented.

How to cite: Acosta, M., Balaji, V., Palomas, S., and Paronuzzi, S.: CPMIP: Computational evaluation of the new era of complex Earth System Models. Multi-model results from CMIP6 and challenges for the exascale computing., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7869, https://doi.org/10.5194/egusphere-egu22-7869, 2022.

EGU22-3095 | Presentations | ESSI2.7

Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16

Milan Klöwer, Samuel Hatfield, Matteo Croci, Peter D. Düben, and Tim Palmer

Most Earth-system simulations run on conventional CPUs in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world’s fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimise rounding errors. Such a compensated time integration is as precise but faster than mixed-precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6·10-5 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.

How to cite: Klöwer, M., Hatfield, S., Croci, M., Düben, P. D., and Palmer, T.: Fluid simulations accelerated with 16 bits: Approaching 4x speedup on A64FX by squeezing ShallowWaters.jl into Float16, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3095, https://doi.org/10.5194/egusphere-egu22-3095, 2022.

EGU22-8094 | Presentations | ESSI2.7

Approach to make an I/O server performance-portable across different platforms: OpenIFS-XIOS integration as a case study

Jan Streffing, Xavier Yepes-Arbós, Mario C. Acosta, and Kim Serradell

Current Earth System Models (ESMs) produce a large amount of data due to an increase in the simulated complexity of the models and their demanded rising spatial resolution. With the exascale era approaching rapidly, an efficient I/O will be critical to sustain model throughput. The most commonly adopted approach in ESMs is the use of scalable parallel I/O solutions that are intended to minimize the overhead of writing data into the storage system. However, I/O servers with inline diagnostics introduce more complexity and many parameters that need to be tuned. This means that it is necessary to achieve an optimal trade-off between throughput and resource usage.

ESMs are usually run on different platforms which might have different architectural specifications: latency, bandwidth, number of cores and memory per node, file system, etc. In addition, a single ESM can run different configurations which require different amounts of resources, resolution, output frequency, number of fields, etc. Since each individual case is particular, the I/O server should be tuned accordingly to each platform and model configuration.

We present an approach to identify and tune a series of important parameters that should be considered in an I/O server. In particular, we focus on the XML Input/Output Server (XIOS) and we use it integrated with OpenIFS –an atmospheric general circulation model– as a case study. We do not only tune basic parameters such as number of XIOS servers, number of servers per node, type and frequency of post-processing operations, etc., but also specific ones such as XIOS buffer size, splitting of NetCDF files across I/O servers, Lustre striping, 2-level server mode of XIOS, etc.

The evaluation of different configurations on different machines proves that it is possible and necessary to find a proper setup for XIOS to achieve a good throughput using an adequate consumption of computational resources. In addition, the results show that the OpenIFS-XIOS integration is performant on the platforms evaluated. This suggests that the integration is portable, though it was initially developed for a specific platform.

How to cite: Streffing, J., Yepes-Arbós, X., C. Acosta, M., and Serradell, K.: Approach to make an I/O server performance-portable across different platforms: OpenIFS-XIOS integration as a case study, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8094, https://doi.org/10.5194/egusphere-egu22-8094, 2022.

EGU22-2431 | Presentations | ESSI2.7

Exploiting GPU capability in the fully spectral magnetohydrodynamics code QuICC

Dmitrii Tolmachev, Andrew Jackson, Philippe Marti, and Giacomo Castiglioni

QuiCC is a code designed to solve the equations of magnetohydrodynamics in a full sphere and other geometries. The aim is to provide understanding of the dynamo process
that sustains planetary magnetic fields for billions of years by thermally-driven convective motion of an electrically-conducting fluid. It also aims to provide the first
clues as to how and why the magnetic fields can undergo reversals. The code must solve the coupled equations of conservation of momentum (the Navier Stokes equation), Maxwell's equations of electrodynamics and the equation of heat transfer. For accuracy and to facilitate imposition of boundary conditions,
a fully spectral method is used in which angular variables in a spherical polar coordinate system are expanded in spherical harmonics, and radial variables are expanded in a
special polynomial expansion in Jones-Worland polynomials. As a result the coordinate singularities at the north and south poles and at the origin disappear.
The code is designed to run on upward of 10^4 processors using MPI and shows excellent scaling.
At the heart of the method is the ability to move between physical and spectral space by a variety of exact transforms: these involve the well-known Fast Fourier Transform (FFT) and also the Legendre transform and Jones-Worland transform.
In this talk we will focus on the latest advancements in the field of fast GPU algorithms for these types of discrete transforms. We present an extension to the publicly-released VkFFT library - GPU Fast Fourier Transform library for Vulkan, CUDA, HIP and OpenCL, that allows the calculation of the Discrete Cosine Transforms of types I-IV. This is a very exciting addition to what VkFFT can do as DCTs are often used in image processing, data compression and numerous other scientific tasks.
So far, this is the first publicly available optimized GPU implementation of DCTs. We also present our progress in creating efficient Spherical Harmonic transforms (SHTs) and radial transforms using  GPU implementations.  This talk will present Jones-Worland and Associated Legendre Polynomial Transforms for modern GPU architectures, implemented based on the VkFFT runtime kernel optimization model. Combined, they can be used to create a new era of full-sphere models for planetary simulations in geophysics.

How to cite: Tolmachev, D., Jackson, A., Marti, P., and Castiglioni, G.: Exploiting GPU capability in the fully spectral magnetohydrodynamics code QuICC, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2431, https://doi.org/10.5194/egusphere-egu22-2431, 2022.

EGU22-10006 | Presentations | ESSI2.7

Scaling and performance assessment of TSMP under CPU-only and CPU-GPU configurations

Daniel Caviedes-Voullième, Jörg Benke, Ghazal Tashakor, Stefan Poll, and Ilya Zhukov

Multiphysics Earth system models are potentially good candidates for progressive porting of modules to run on accelerator hardware. Typically, these models have an inherently modular design to cope with the variety of numerical formulations and computational implementations required for the range of physical processes they represent. Progressively porting modules or submodels to accelerators such as GPUs implies that models must run on heterogeneous hardware. Foreseeably, exascale systems will make use of heterogeneous hardware, and therefore, exploring early on such heterogeneous configurations is of importance and a challenge.

The Terrestrial Systems Modelling Platform (TSMP) is a scale-consistent, highly modular, massively parallel, fully integrated soil-vegetation-atmosphere modelling system. Currently, TSMP is based on the COSMO atmospheric model, the CLM land surface model, and the ParFlow hydrological model, linked together by means of the OASIS3-MCT library.

Recently, ParFlow was ported to GPU, therefore enabling the possibility of running TSMP under a heterogeneous configuration, that is COSMO and CLM running on CPUs, and ParFlow running on GPUs. The different computational demands of each submodel inherently result in non-trivial load balancing across the submodels. This has been addressed by studying the performance and scaling properties of the system for specific problems of interest. The new heterogeneous configurations prompts a re-assessment of load balancing, performance and scaling, in order to identify optimal computational resource configurations, and re-evaluate the bottlenecks and inefficiencies that the heterogeneous model system can have.

In this contribution, we present first results on performance and scaling assessment of the heterogeneous TSMP, compared to its performance under homogeneous (CPU-only) configurations. We study strong and weak scaling, for different problem sizes, and evaluate parallel efficiency and power consumption, for homogeneous and heterogeneous jobs on the JUWELS supercomputer, and on the experimental DEEP-Cluster, both at the Jülich Supercomputing Centre. Additionally, we explore profiles and traces of selected cases, both on homogeneous and heterogeneous runs, to identify MPI communication bottlenecks and root causes of the load balancing issue.  

How to cite: Caviedes-Voullième, D., Benke, J., Tashakor, G., Poll, S., and Zhukov, I.: Scaling and performance assessment of TSMP under CPU-only and CPU-GPU configurations, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10006, https://doi.org/10.5194/egusphere-egu22-10006, 2022.

EGU22-11212 | Presentations | ESSI2.7

Lightweight embedded DSLs for geoscientific models.

Zbigniew Piotrowski, Daniel Caviedes Voullieme, Jaro Hokkanen, Stefan Kollet, and Olaf Stein

On the map of the ESM research and operational software efforts, a notable area is occupied by the mid-size codes that benefit from established code design and user base and are developed by the domain scientists. Contrary to the major operational frameworks and newly established software projects, however, developers of such codes cannot easily benefit from novel solutions providing performance portability, nor have access to software engineering teams capable to perform full code rewrite, aiming at novel hardware architectures. While evolving accelerator programming paradigms like CUDA or OpenACC enable reasonably fast progress towards execution on heterogenous architectures, they do not offer universal portability and immediately impair code readability and maintainability. In this contribution we report on a lightweight embedded Domain Specific Language (eDSL) approach that enables legacy CPU codes to execute on GPU. In addition, it is minimally invasive  and maximizes code readability and developer productivity.  In the implementation, the eDSL serves as a front end for hardware dependent programming models, such as CUDA. In addition, performance portability can be achieved efficiently by implementing parallel execution and memory abstraction programming models, such as Kokkos as a backend. We evaluate the adaptation process and computational performance of the two established geophysical codes: the ParFlow hydrologic model written in C, and the Fortran-based dwarf encapsulating  MPDATA transport algorithm. Performance portability is demonstrated in the case of ParFlow. We present scalability results on state-of-the-art AMD CPUs and NVIDIA GPUs of JUWELS booster supercomputer. We discuss the advantages and limitations of the proposed approach in the context of other direct and DSL-based strategies allowing for exploitation of the modern accelerator-based computing platforms.

How to cite: Piotrowski, Z., Caviedes Voullieme, D., Hokkanen, J., Kollet, S., and Stein, O.: Lightweight embedded DSLs for geoscientific models., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11212, https://doi.org/10.5194/egusphere-egu22-11212, 2022.

EGU22-10919 | Presentations | ESSI2.7

The Pilot Lab Exascale Earth System Modelling

Catrin I. Meyer and the PilotLab ExaESM Team

The Pilot Lab Exascale Earth System Modelling (PL-ExaESM) is a “Helmholtz-Incubator Information & Data Science” project and explores specific concepts to enable exascale readiness of Earth System models and associated work flows in Earth System science. PL-ExaESM provides a new platform for scientists of the Helmholtz Association to develop scientific and technological concepts for future generation Earth System models and data analysis systems. Even though extreme events can lead to disruptive changes in society and the environment, current generation models have limited skills particularly with respect to the simulation of these events. Reliable quantification of extreme events requires models with unprecedentedly high resolution and timely analysis of huge volumes of observational and simulation data, which drastically increase the demand on computing power as well as data storage and analysis capacities. At the same time, the unprecedented complexity and heterogeneity of exascale systems, will require new software paradigms for next generation Earth System models as well as fundamentally new concepts for the integration of models and data. Specifically, novel solutions for the parallelisation and scheduling of model components, the handling and staging of huge data volumes and a seamless integration of information management strategies throughout the entire process-value chain from global Earth System simulations to local scale impact models are being developed in PL-ExaESM. The potential of machine learning to optimize these tasks is investigated. At the end of the project, several program libraries and workflows will be available, which provide the basis for the development of next generation Earth System models.

In the PL-ExaESM, scientists from 9 Helmholtz institutions work together to address 5 specific problems of exascale Earth system modelling:

  • Scalability: models are being ported to next-generation GPU processor technology and the codes are modularized so that computer scientists can better help to optimize the models on new hardware.
  • Load balancing: asynchronous workflows are being developed to allow for more efficient orchestration of the increasing model output while preserving the necessary flexibility to control the simulation output according to the scientific needs.
  • Data staging: new emerging dense memory technologies allow new ways of optimizing I/O operations of data-intensive applications running on HPC clusters and future Exascale systems.
  • System design: the results of dedicated performance tests of Earth system models and Earth system data workflows are analysed in light of potential improvements of the future exascale supercomputer system design.
  • Machine learning: modern machine learning approaches are tested for their suitability to replace computationally expensive model calculations and speed up the model simulations or make better use of available observation data.

How to cite: Meyer, C. I. and the PilotLab ExaESM Team: The Pilot Lab Exascale Earth System Modelling, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10919, https://doi.org/10.5194/egusphere-egu22-10919, 2022.

EGU22-12099 | Presentations | ESSI2.7

A vision and strategy to revamp ESM workflows at DKRZ

Karsten Peters-von Gehlen, Ivonne Anders, Daniel Heydebreck, Christopher Kadow, Florian Ziemen, and Hannes Thiemann

The German Climate Computing Center (DKRZ) is an established topical IT service provider serving the needs of the German climate science community and their associated partners. At DKRZ, climate researchers have the means available to cover every aspect of the research life cycle, ranging from planning, model development and testing, model execution on the in-house HPC cluster (16 PFlops mainly CPU-based, 130 PB disk storage), data analysis (batch jobs, Jupyter, Freva), data publication and dissemination via the Earth System Grid Federation (ESGF) as well as long-term data preservation either at the project-level (little curation) or in the CoreTrustSeal certified World Data Center for Climate (WDCC) (extensive curation along the FAIR data principles). A plethora of user support services offered by domain-expert staff complement DKRZ’s portfolio.

 

With the new HPC system coming online in early 2022 and a number of funded and to-be funded projects exploiting the available computational resources for conducting e.g. global storm-resolving (grid spacing O(1-3km)) simulations on climatic timescales, the current interplay DKRZ’s services needs to be revisited to devise a unified workflow that will be able to handle the upcoming challenges. 

 

This is why the above mentioned projects will supply a significant amount of funds to conceive a framework to efficiently orchestrate the entire model development, model execution and data handling workflow at DKRZ in close collaboration with the climate science community.

 

In this contribution, we will detail our vision of a revamped and versatile ESM orchestration framework at DKRZ. Currently, this vision is based on having the orchestration performed by the Freva System (http://doi.org/10.5334/jors.253), in which users will be able to kick-off model compilation, compute and analysis jobs. Furthermore, Freva enables seamless provenance tracking of the entire workflow. Together with the implementation of data publication, long-term archiving and data dissemination workflows, the envisioned system provides a complete package of FAIR Digital Objects (FDOs) to researchers and allows for reproducibility, transparency and reduction of data redundancy. 

How to cite: Peters-von Gehlen, K., Anders, I., Heydebreck, D., Kadow, C., Ziemen, F., and Thiemann, H.: A vision and strategy to revamp ESM workflows at DKRZ, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12099, https://doi.org/10.5194/egusphere-egu22-12099, 2022.

EGU22-3109 | Presentations | ESSI2.7

Compressing atmospheric data into its real information content

Milan Klöwer, Miha Razinger, Juan J. Dominguez, Peter D. Düben, and Tim Palmer

Hundreds of petabytes are produced annually at weather and climate forecast centres worldwide. Compression is essential to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data, leaving the level of meaningful precision unassessed or often subjectively chosen. Many of the trailing mantissa bits in floating-point numbers occur independently with high information entropy, reducing the efficiency of compression algorithms. Here we define the bitwise real information content from information theory as the mutual information of bits in adjacent grid points. The analysis automatically determines a precision from the data itself, based on the separation of real and false information bits. Applied to data from the Copernicus Atmospheric Monitoring Service (CAMS), most variables contain fewer than 7  bits of real information per value and are highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. The removal of bits with high entropy but low real information allows us to minimize information loss but maximize the efficiency of the compression algorithms. All CAMS data are 17x compressed in the longitudinal dimension and relative to 64-bit floats, while preserving 99% of real information. Combined with four-dimensional compression using the floating-point compressor Zfp, factors beyond 60x are achieved, with no significant increase of the forecast error. For multidimensional compression it is generally advantageous to include as many highly correlated dimensions as possible. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data. 

How to cite: Klöwer, M., Razinger, M., Dominguez, J. J., Düben, P. D., and Palmer, T.: Compressing atmospheric data into its real information content, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3109, https://doi.org/10.5194/egusphere-egu22-3109, 2022.

EGU22-8762 | Presentations | ESSI2.7

Lossy Data Compression and the Community Earth System Model

Allison H. Baker, Dorit M. Hammerling, Alex Pinard, and Haiying Xu

Climate models such as the Community Earth System Model (CESM) typically produce enormous amounts of output data, and storage capacities have not increased as rapidly as processor speeds over the years. As a result, the cost of storing huge data volumes has become increasingly problematic and has forced climate scientists to make hard choices about which variables to save, data output frequency, simulation lengths, or ensemble sizes, all of which can negatively impact science objectives.  Therefore, we have been investigating lossy data compression techniques as a means of reducing data storage for CESM.  Lossy compression, by definition, does not exactly preserve the original data, but it achieves higher compression rates and subsequently smaller storage requirements. However, as with any data reduction approach, we must exercise extreme care when applying lossy compression to climate output data to avoid introducing artifacts in the data that could affect scientific conclusions.  Our focus has been on better understanding the effects of lossy compression on spatio-temporal climate data and on gaining user acceptance via careful analysis and testing. In this talk, we will describe the challenges and concerns that we have encountered when compressing climate data from CESM and will discuss developing appropriate climate-specific metrics and tools to enable scientists to evaluate the effects of lossy compression on their own data and facilitate optimizing compression for each variable.  In particular, we will present our Large Data Comparison for Python (LDCPy) package for visualizing and computing statistics on differences between multiple datasets, which enables climate scientists to discover potentially relevant compression-induced artifacts in their data.  Additionally, we will demonstrate the usefulness of an alternative to the popular SSIM that we developed, called the Data SSIM (DSSIM), that can be applied directly to the floating-point data in the context of evaluating differences due to lossy compression on large volumes of simulation data.

How to cite: Baker, A. H., Hammerling, D. M., Pinard, A., and Xu, H.: Lossy Data Compression and the Community Earth System Model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8762, https://doi.org/10.5194/egusphere-egu22-8762, 2022.

EGU22-10774 | Presentations | ESSI2.7

Understanding the effects of Modern Lossless and Lossy Compressors on the Community Earth Science Model

Robert Underwood, Sheng Di, and Franck Cappello

Large scale climate simulations such as the Community Earth Science Model (CESM) produce enormous volumes of data per run. Transferring and storing this volume of data can be challenging leading researchers to consider data compression in order to mitigate the performance, monetary and environmental costs. In this work, we survey 8 methods ranging from higher-order SVD, multigrid, transform, and prediction based lossy compressors as well as specialized floating point lossless and lossy compressors and general lossless compressors to determine which methods are most effective at reducing the storage footprint.  We consider four components (atmosphere, ice, land, and ocean) within CESM taking into account the stringent quality thresholds required to preserve the integrity of climate research data. Our work goes beyond existing studies of compressor performance by considering these newer compression techniques, and by accounting for the candidate quality thresholds identified in prior work by Hammerling et al.  This provides a more realistic picture of the performance of lossy compression methods relative to lossless compression methods subject to each of these constraints with up to a 5.2x improvement over the leading lossless compressor and 21x over no compression. Our work features an automated method to automatically identify a configuration that satisfies the quality requirements for the lossy compressors that is agnostic to compressor implementations. 

How to cite: Underwood, R., Di, S., and Cappello, F.: Understanding the effects of Modern Lossless and Lossy Compressors on the Community Earth Science Model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10774, https://doi.org/10.5194/egusphere-egu22-10774, 2022.

EGU22-9741 | Presentations | ESSI2.7

Improving lossy compression for climate datasets with SZ3

Franck cappello, Sheng Di, and Robert Underwood

The projection into 2030 of the climate data volume increase brings an important challenge to the climate science community. This is particularly true for the CMIP7 that is projected to need about an Exabyte of storage capacity. Error-bounded lossy compression is explored as a potential solution to the above problem by different climate research teams. Several lossy compression schemes have been proposed leveraging different forms of decorrelation (transforms, prediction, HoSVD, DNN), quantization (linear, non-linear, vector), and encoding (dictionary-based, variable length, etc.) algorithms. Our experience with different applications shows that the compression methods often need to be customized and optimized to fit the specificities of the datasets to compress and the user requirements on the compression quality, ratio, and throughput. However, none of the existing lossy compression software for scientific data has been designed to be customizable. To address this issue, we developed SZ3, an innovative customizable, modular compression framework. SZ3 is a full C++ refactoring of SZ2 enabling the specialization, addition, or removal of each stage of the lossy compression pipeline to fit the specific characteristics of the datasets to compress and the use-case requirements. This extreme flexibility allows adapting SZ3 to many different use-cases, from ultra-high compression for visualization to ultra-high-speed compression between the CPU (or GPU) and the memory. Thanks to its unique set of features: customization, high compression ratio, high compression throughput, and excellent accuracy preservation, SZ3 won a 2021 R&D100 award. In this presentation, we present SZ3 and a new data prediction-based decorrelation method that significantly improves the compression ratios for climate datasets over the state-of-the-art lossy compressors, while preserving the same data accuracy. Experiments based on CESM datasets show that SZ3 can lead to up to 300% higher compression ratios than SZ2 with the same compression error bound and similar compression throughput.

How to cite: cappello, F., Di, S., and Underwood, R.: Improving lossy compression for climate datasets with SZ3, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9741, https://doi.org/10.5194/egusphere-egu22-9741, 2022.

EGU22-9948 | Presentations | ESSI2.7

Exploring Lossy Compressibility through Statistical Correlations of Geophysical Datasets

Julie Bessac, David Krasowksa, Robert Underwood, Sheng Di, Jon Calhoun, and Franck Cappello

Lossy compression plays a growing role in geophysical and other computer-based simulations where the cost of storing their output data on large-scale systems can span terabytes and even petabytes in some cases. Using error-bounded lossy compression reduces the amount of storage for each simulation; however, there is no known bound for the upper limit on lossy compressibility for a given dataset. Correlation structures in the data, choice of compressor and error bound are factors allowing larger compression ratios and improved quality metrics. Analyzing these three factors provides one direction towards quantifying limits of lossy compressibility. As a first step, we explore statistical methods to characterize correlation structures present in several climate simulations and their relationships, through functional regression models, to compression ratios. In particular, we show results for climate simulations from the Community Earth System Model (CESM) as well as for hurricanes simulations from Hurricane-ISABEL from IEEE Visualization 2004 contest, compression ratios of the widely used lossy compressors for scientific data SZ, ZFP and MGARD exhibit a logarithmic dependence to the global and local correlation ranges when combined with information on the variability of the considered fields through the variance or gradient magnitude. Further works will focus on providing a unified characterization of these relationships across compressors and error bounds. This consists of a first step towards evaluating the theoretical limits of lossy compressibility used to eventually predict compression performance and adapt compressors to correlation structures present in the data. 

How to cite: Bessac, J., Krasowksa, D., Underwood, R., Di, S., Calhoun, J., and Cappello, F.: Exploring Lossy Compressibility through Statistical Correlations of Geophysical Datasets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9948, https://doi.org/10.5194/egusphere-egu22-9948, 2022.

EGU22-9153 | Presentations | ESSI2.7

Exploring the SZ lossy compressor use for the XIOS I/O server

Xavier Yepes-Arbós, Sheng Di, Kim Serradell, Franck Cappello, and Mario C. Acosta

Earth system models (ESMs) have increased the spatial resolution to achieve more accurate solutions. As a consequence, the number of grid points increases dramatically, so an enormous amount of data is produced as simulation results. In addition, if ESMs manage to take advantage of the upcoming exascale computing power, their current data management system will become a bottleneck as the data production will grow exponentially.

The XML Input/Output Server (XIOS) is an MPI parallel I/O server designed for ESMs to efficiently post-process data inline as well as read and write data in NetCDF4 format. Although it offers a good performance in terms of computational efficiency for current resolutions, this could change for larger resolutions since the XIOS performance is very dependent on the output size. To address this problem we test the HDF5 compression in order to reduce the size of the data so that both I/O time and storage footprint can be improved. However, the default lossless compression filter of HDF5 does not provide a good trade-off between size reduction and computational cost. 

Alternatively, we consider using lossy compression filters that may allow reaching high compression ratios and enough compression speed to considerably reduce the I/O time while keeping high accuracy. In particular, we are exploring the feasibility of using the SZ lossy compressor developed by the Argonne National Laboratory (ANL) to write highly compressed NetCDF files through XIOS. As a case study, the Open Integrated Forecast System (OpenIFS) is used, an atmospheric general circulation model that can use XIOS to output data.

How to cite: Yepes-Arbós, X., Di, S., Serradell, K., Cappello, F., and C. Acosta, M.: Exploring the SZ lossy compressor use for the XIOS I/O server, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9153, https://doi.org/10.5194/egusphere-egu22-9153, 2022.

EGU22-946 | Presentations | ESSI2.7

Lossy Scientific Data Compression With SPERR

Samuel Li and John Clyne

Much of the research in lossy data compression has focused on minimizing the average error for a given storage budget. For scientific applications, the maximum point-wise error is often of greater interest than the average error. This paper introduces an algorithm that encodes outliers—data points exceeding a specified point-wise error tolerance—produced by a lossy compression algorithm optimized for minimizing average error. These outliers can then be corrected to be within the error tolerance when decoding. We pair this outlier coding algorithm with an in-house implementation of SPECK, a lossy compression algorithm based on wavelets that exhibits excellent rate-distortion performance (where distortion is measured by the average error), and introduce a new lossy compression product that we call SPERR. Compared to two leading scientific data compressors, SPERR uses less storage to guarantee an error bound and produces better overall rate-distortion curves at a moderate cost of added computation. Finally, SPERR facilitates interactive data exploration by exploiting the multiresolution properties of wavelets and their ability to reconstruct coarsened data volumes on the fly.

How to cite: Li, S. and Clyne, J.: Lossy Scientific Data Compression With SPERR, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-946, https://doi.org/10.5194/egusphere-egu22-946, 2022.

Here we discuss our experiences with of ZFP lossy floating point compression in eddy resolving cloud modeling simulations of violent thunderstorms executed on the Blue Waters and Frontera supercomputers. Lossy compression has reduced our simulation data load by a factor of 20-100 from uncompressed. This savings enables us to save data at extremely high temporal resolution, up to the model's time step, the smallest possible temporal discretization. Further data savings is realized by only saving a subdomain of the entire simulation, and this is has opened the door to new approaches to analysis.  We will discuss the Lack Of a File System (LOFS) compressed format that model data is saved in, as well as conversion routines to create individual ZFP compressed NetCDF4 files for shaing with collaborators and for archiving. Further, we will discuss the effect of lossy compression on offline Lagrangian parcel analysis from LOFS data. Preliminary results suggest that high compression does not alter parcel paths considerably in cloud model simulation data over several minutes of integration as compared to uncompressed.

How to cite: Orf, L. and Halbert, K.: Lossy compression in violent thunderstorm simulations: Lessons learned and future goals, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3230, https://doi.org/10.5194/egusphere-egu22-3230, 2022.

EGU22-7151 | Presentations | ESSI2.7

Storage growth mitigation through data analysis ready climate datasets using HDF5 Virtual Datasets

Ezequiel Cimadevilla and Antonio S. Cofiño

Climate datasets are usually provided in separate files that facilitate dataset management in climate data distribution systems. In ESGF1 (Earth System Grid Federation) a time series of a variable is split into smaller pieces of data in order to reduce file size. Although this enhances usability for data management in the ESGF distribution system (i.e. file publishing, download, …), this demotes usability for data analysis. Usually, workflows need to pre-process and rearrange multiple files as a single data source, in order to obtain a data analysis dataset, involving data rewriting and duplication with the corresponding storage growth.

The mitigation of storage growth can be achieved by creating virtual views, allowing a number of actual datasets to be multidimensionally mapped together into a single multidimensional dataset that does not require rewriting data nor to consume additional storage. Due to the increasing interest in offering to climate researchers appropriate single data analysis datasets, some mechanisms have been or are being developed to tackle this issue, such as NcML (netCDF Markup Language), xarray/netCDF-4 Multiple File datasets and H5VDS. HDF5 Virtual Datasets3 (H5VDS) provide researchers with different views of interest of a compound dataset, without the cost of duplicating information, facilitating data analysis in an easy and transparent way.

In the climate community and in ESGF, netCDF is the standard data model and format for climate data exchange. netCDF-4 default storage format is HDF5, introducing into the netCDF library features from HDF5. This includes chunking2, compression2, virtual datasets and many other capabilities. H5VDS introduces a new dataset storage type that allows a number of multiple HDF5 (and netCDF-4) datasets to be mapped together into a single sliceable dataset via an interface layer. The datasets can be mixed in arbitrary combinations, based on range selection mapping to range selection on sources. This mapping allows mapping between different data types and to add, remove or modify existing metadata (i.e. datasets attributes), which usually it’s a common issue to access the data. 

In this work, examples of applications of H5VDS features are applied to CMIP6 climate simulations datasets from ESGF, in order to provide data analysis ready virtual datasets. Examples of common tools/libraries (i.e. netcdf-c, xarray, nco, cdo, …) illustrate the convenience of the proposed approach. Using H5VDS facilitates data analysis workflows by enabling climate researchers to focus on data analysis rather than data engineering tasks. Also, since the H5VDS is created at the storage layer, these datasets are transparent to the netCDF-4 library and existing applications can benefit from this feature.

References

[1] L. Cinquini et al., “The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data,” Future Generation Computer Systems, vol. 36, pp. 400–417, 2014, doi: 10.1016/j.future.2013.07.002. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0167739X13001477. [Accessed: 16-Jan-2020]
[2] The HDF Group, “Chunking in HDF5”, 11-Feb.-2019. [Online]. Available: https://portal.hdfgroup.org/display/HDF5/Chunking+in+HDF5. [Accessed: 12-Jan.-2022]

[3] The HDF Group, “Virtual Dataset VDS”, 06-Apr.-2018. [Online]. Available: https://portal.hdfgroup.org/display/HDF5/Virtual+Dataset++-+VDS. [Accessed: 12-Jan.-2022]

Acknowledgements

This work it’s been developed under support from IS-ENES3 which is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824084.

How to cite: Cimadevilla, E. and Cofiño, A. S.: Storage growth mitigation through data analysis ready climate datasets using HDF5 Virtual Datasets, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7151, https://doi.org/10.5194/egusphere-egu22-7151, 2022.

EGU22-1149 | Presentations | ESSI2.7

Practical notes on lossy compression of scientific data

Rostislav Kouznetsov

Lossy compression methods are extremely efficient in terms of space and performance and allow for reduction of network bandwidth and disk space needed to store data arrays without sacrificing the number of stored values.  Lossy compression involves an irreversible transformation of data that reduces the information content of the data.  The transformation introduces a distortion that is normally measured in terms of absolute or relative error. The error is higher for higher compression ratios.  A good choice of lossy compression parameters maximizes the compression ratio while keeping the introduced error within acceptable margins.  Negligence or failure to chose a right compression method or its parameters leads to poor compression ratio, or loss of the data.

A good strategy for lossy compression would involve sepcification of the acceptible error margin and choice of compression parameters and storage format. We will discuss specific techniques of lossy compression, and illustrate pitfalls in choice of the error margins and tools for lossy/lossless compression. The following specific topics will be covered:

1. Packing of floating-point data to integers in NetCDF is sub-optimal in most cases,   and for some quantities leads to severe errors.
2. Keeping relative vs absolute precision: false alternative.
3. Acceptible error margin depends on both the origin and the intended application of data.
4. Smart algorithms to decide on compression parameters have limited area of applicability,   which has to be considered in each individual case.
5. Choice of a format for compressed data (NetCDF, GRIB2, Zarr): tradeoff between size, speed and precision.
6. What "number_of_significant_digits" and "least_significant_digit" mean in terms of relative/absolute error.
7. Bit-Shuffle is not always beneficial.

How to cite: Kouznetsov, R.: Practical notes on lossy compression of scientific data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1149, https://doi.org/10.5194/egusphere-egu22-1149, 2022.

EGU22-13259 | Presentations | ESSI2.7

Adding Quantization to the NetCDF C and Fortran Libraries to Enable Lossy Compression

Edward Hartnett and Charles Zender

The increasing volume of Earth science data sets continue to present challenges for large data producers. In order to support lossy compression in the netCDF C and Fortran libraries, we have added a quantize feature for netCDF floating point variables. When the quantize feature is enabled, the data creator specifies the number of significant digits. As data are written, the netCDF libraries apply a quantization algorithm which guarantees that the number of significant digits (for BitGroom and Granular BitRound algorithms) or bits (for BitRound algorithm) will be preserved, while setting unneeded bits to a constant value. This allows zlib lossless compression (or any other lossless compression) to achieve better and faster compression.

How to cite: Hartnett, E. and Zender, C.: Adding Quantization to the NetCDF C and Fortran Libraries to Enable Lossy Compression, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13259, https://doi.org/10.5194/egusphere-egu22-13259, 2022.

ESSI2.9 – Software tools and semantics for geospatial research

EGU22-1237 | Presentations | ESSI2.9

rassta: Raster-based Spatial Stratification Algorithms

Bryan Fuentes, Minerva Dorantes, and John Tipton

Spatial stratification of landscapes allows for the development of efficient sampling surveys, the inclusion of domain knowledge in data-driven modeling frameworks, and the production of information relating the spatial variability of response phenomena to that of landscape processes. This
work presents the rassta R package as a collection of algorithms dedicated to the spatial stratification of landscapes, the calculation of landscape correspondence metrics across geographic space, and the application of these metrics for spatial sampling and modeling of environmental phenomena.
The theoretical background of rassta is presented through references to several studies which have benefited from landscape stratification routines. The functionality of rassta is presented through code examples which are complemented with the geographic visualization of their outputs.

How to cite: Fuentes, B., Dorantes, M., and Tipton, J.: rassta: Raster-based Spatial Stratification Algorithms, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1237, https://doi.org/10.5194/egusphere-egu22-1237, 2022.

EGU22-3509 | Presentations | ESSI2.9

Simulation of slow geomorphic flows with r.avaflow

Martin Mergili, Andreas Kellerer-Pirklbauer-Eulenstein, Christian Bauer, and Jan-Thomas Fischer

GIS-based open-source simulation tools for extremely rapid mass flow processes such as snow avalanches, rock avalanches, or debris flows are readily available, covering a broad range of complexity levels – e.g., from single-phase to multi-phase. However, these tools are not suitable for slower types of mass flows characterized by high viscosities. The conventionally used momentum balance equations for rapid flows often appear numerically unstable for high viscosities, leading to the immediate reversion of flow direction or stopping, without appropriate numerical treatment. GIS-based simulation efforts of slow geomorphic flows are reported in the literature, and open source tools are available for specific phenomena such as glaciers, but no comprehensive and readily usable simulation tools have been proposed yet.

We present a simple depth-averaged model implementation for the simulation of slow geomorphic flows, including glaciers, rock glaciers, highly viscous lava flows, and those flow-type landslides not classified as extremely or very rapid. Thereby, we use an equilibrium-of-motion concept. For each time step, flow momentum and velocity are computed as the equilibrium between accelerating gravitational forces and decelerating viscous forces, also including a simple law for basal sliding. Momentum balances are not carried over from one time step to the next, meaning that inertial forces, which are not important for slow-moving mass flows, are neglected. Whereas these basic principles are applied to all relevant processes, there is flexibility with regard to the details of model formulation and parameterization: e.g., the well-established shallow-ice approximation can be used to simulate glacier flow.

The model is implemented with the GRASS GIS-based open-source mass flow simulation framework r.avaflow and demonstrated on four case studies: an earth flow, the growth of a lava dome, a rock glacier, and a glacier (considering accumulation and ablation). All four processes were reproduced in a plausible way. However, parameterization remains a challenge due to spatio-temporal changes and temperature dependency of viscosity and basal sliding. Our model and its implementation open up new possibilities for climate change impact studies, natural hazard analysis, and environmental education.

How to cite: Mergili, M., Kellerer-Pirklbauer-Eulenstein, A., Bauer, C., and Fischer, J.-T.: Simulation of slow geomorphic flows with r.avaflow, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3509, https://doi.org/10.5194/egusphere-egu22-3509, 2022.

EGU22-11486 | Presentations | ESSI2.9

Geopropy: An open source tool to generate 3D geological cross sections 

Ashkan Hassanzadeh, Enric Vázquez-Suñé, Mercè Corbella, and Rotman Criollo

Cross sections play a significant role in environmental and geological studies. In general, underground models  can be generated by experienced geologists or derived from mathematical-based geological data interpolation. In this work we present Geopropy, a hybrid knowledge-data driven tool that mimics the straightforward stages that a geologist follows to generate a geological cross-section, taking into account the available data but without using complex mathematical algorithms of interpolation. Geopropy separates the areas with one possible geological outcome from those with multiple possible geological scenarios based on the given hard data. The algorithm creates the cross section in the simple areas and in order to reach a unique outcome, the user is asked for decisions or more hard data in semi-automatic and manual stages based on the complexity of the cross section. The outputs are 3D shapefiles that are again checked  with the introduced hard data to avoid inconsistencies or possible personal biases. Geopropy is therefore an open source Python library support tool for geologists in explicit modelling that aims to reach simple, consistent and fast results.

How to cite: Hassanzadeh, A., Vázquez-Suñé, E., Corbella, M., and Criollo, R.: Geopropy: An open source tool to generate 3D geological cross sections , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11486, https://doi.org/10.5194/egusphere-egu22-11486, 2022.

EGU22-5774 | Presentations | ESSI2.9

SciQLop: an open source project for in situ data analysis

Alexis Jeandet, Nicolas Aunai, Vincent Génot, Alexandre Schulz, Benjamin Renard, Michotte de Welle Bayane, and Gautier Nguyen

The SCIentific Qt application for Learning from Observations of Plasmas (SciQLop) project allows to easily discover, retrieve, plot and label in situ space physic measurements from remote servers such as Coordinated Data Analysis Web (CDAWeb) or Automated Multi-Dataset Analysis (AMDA).  Analyzing data from a single instrument on a given mission can rise some technical difficulties such as finding where to get them, how to get them and sometimes how to read them.  Thus building for example a machine-learning pipeline involving multiple instruments and even multiple spacecraft missions can be very challenging. Our goal here is to remove all these technical difficulties without sacrificing performances to allow scientist to focus on data analysis.
SciQLop development has started in 2015 as a C++ graphical application funded by the Paris-Saclay Center for Data Science (CDS) then by Paris-Saclay SPACEOBS and finally it joined the Plasma Physics Data Center (CDPP) in 2019. It has evolved from  a monolithic C++ graphical application to a collection of simple and reusable Python or C++ packages solving one problem at a time, increasing our chances to reach users and contributors.

The SciQLop project is composed of the following tools:

  • Speasy: An easy to use Python package to retrieve data from remote servers with multi-layer cache support.
  • Speasy_proxy: A self-hostable, chainable remote cache for Speasy written as a simple Python package.
  • Broni: A Python package which finds intersections between spacecraft trajectories and simple shapes or physical models such as magnetosheath.
  • Orbit-viewer: A Python graphical user interface (GUI) for Broni.
  • TSCat: A Python package used as backend for catalogs of events storage.
  • TSCat-GUI: A Python graphical user interface (GUI).
  • SciQLop-GUI: An extensible and efficient user interface to visualize and label time-series with an embedded IPYthon terminal.

While some components are production ready and already used for science, SciQLop is still in development and the landscape is moving quite fast.
In this presentation we will give an overview of SciQLop, demonstrate its benefits using some specific cases studies and finally discuss the planned features development.

How to cite: Jeandet, A., Aunai, N., Génot, V., Schulz, A., Renard, B., Bayane, M. D. W., and Nguyen, G.: SciQLop: an open source project for in situ data analysis, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5774, https://doi.org/10.5194/egusphere-egu22-5774, 2022.

EGU22-9454 | Presentations | ESSI2.9

GEE and Machine Learning for mapping burnt areas from ESA’s Sentinel-2 demonstrated in a Greek setting

Ioanna Tselka, Spyridon E. Detsikas, Isidora Isis Demertzi, George P. Petropoulos, Dimitris Triantakonstantis, and Efthimios Karymbalis

Climate change has resulted to an increase in the occurrence and frequency of natural disasters worldwide. An increased concern today is wildfire incidents, which constitute one of the greatest problems due to the ecological, economical and social impacts. Thus, it is very important to obtain accurately and robustly information on burned area cartography. Recent advances in the field of geoinformation have allowed the development of cloud- based platforms for EO data processing such as Google Earth Engine (GEE). The latter allows rapid processing of large amount of data in an efficient way, saving costs and time since there is also no need to locally download and process the EO datasets in specialized software packages committing also own computing resources. In the present study, a GEE-based approach that exploits machine learning (ML) techniques is developed with the purpose of automating the mapping of burnt areas from ESA’s Sentinel-2 imagery. To demonstrate the developed scheme, as a case study is used one of the largest wildfire events occurred in the summer of 2021 in the outskirts of Athens, Greece. A Sentinel-2 image, obtained from GEE immediately after the fire event, was combined with ML classifiers for the purpose of mapping the burnt area at the fire-affected site. Accuracy  assessment was conducted on the basis of both the error matrix approach and the Copernicus Rapid Mapping operational product specific to this fire event. All the geospatial analysis was conducted in a GIS environment. Our results evidenced the ability of the synergistic use of Sentinel-2 imagery with ML to map accurately and robustly the burnt area in the studied region. This information can provide valuable help towards prioritization of activities relevant to the rehabilitation of the fire-affected areas and post fire management activities. Last but not least, this study provides further evidence of the unique advantages of GEE towards a potential an automation of burnt area delineation over large scales.

KEYWORDS: GEE, Machine Learning, Sentinel-2, Burnt area mapping, Copernicus

How to cite: Tselka, I., Detsikas, S. E., Demertzi, I. I., Petropoulos, G. P., Triantakonstantis, D., and Karymbalis, E.: GEE and Machine Learning for mapping burnt areas from ESA’s Sentinel-2 demonstrated in a Greek setting, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9454, https://doi.org/10.5194/egusphere-egu22-9454, 2022.

EGU22-11722 | Presentations | ESSI2.9

Monitoring morphological changes in river deltas exploiting GEE and the full Landsat archive

Isidora Isis Demertzi, Spyridon E. Detsikas, Ioanna Tselka, Ioanna Tzanavari, Dimitris Triantakonstantis, Efthimios Karymbalis, and George P. Petropoulos

River deltas are considered among the most diverse ecosystems with significant environmental and agricultural importance. These landscapes are vulnerable to any human activity or natural process that can disturb the fragile balance between water and land, thus causing morphological changes in the delta fronts. Earth observation (EO), with its capability to provide systematic, inter-temporal and cost-effective data provides a promising potential in monitoring the dynamic changes of river deltas. Recent advances in geoinformation technologies have allowed the development of cloud-based platforms for EO processing such as Google Earth Engine (GEE). It offers unique advantages such as rapid processing of a large amount of data in a cost and time-efficient manner. This study aims to assess the added value of GEE in monitoring the coastal surface area of river deltas based on the full Landsat archive (TM, ETM+, OLI, L9) and a machine learning (ML) technique. As a case study two river deltas, Axios & Aliakmonas, were selected located in northern Greece. Those are two of the largest rivers of the country, with Axios being also the second largest in the Balkans. Their joint river deltas create a fertile valley with great environmental and agricultural importance, which has also exhibited very strong dynamics in terms of its morphological characteristics over the last decades. In order to gain a better insight into the coastal dynamics of the studied region, Landsat multi-spectral data covering the period 1984 - present time was integrated into GEE and a Machine Learning (ML) classification approach was developed in the cloud-based environment. The two rivers delta dynamics were also mapped independently using photo interpretation serving as our reference dataset to map the river delta dynamics, in accordance to other studies. All the geospatial data analysis of the extracted morphological features of the river deltas was conducted in a geographical information system (GIS) environment. Our results evidenced the unique advantages of cloud platforms such as GEE, towards the operationalization of the investigated approaches for coastal morphological changes such as those found in the studied river deltas. Unique characteristics of the proposed herein methodology consist of the exploitation of the cloud-based platform GEE together with the advanced ML image processing algorithm and the full utilization of the Landsat images available today. The proposed approach also can be fully-automated and is transferable to other similar areas and can prove valuable help in understanding the spatiotemporal changes in coastal surface area over large areas.

KEYWORDS: Google Earth Engine, Landsat, Machine Learning, Earth Observation, river delta

How to cite: Demertzi, I. I., Detsikas, S. E., Tselka, I., Tzanavari, I., Triantakonstantis, D., Karymbalis, E., and Petropoulos, G. P.: Monitoring morphological changes in river deltas exploiting GEE and the full Landsat archive, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11722, https://doi.org/10.5194/egusphere-egu22-11722, 2022.

Earth science data usually have distinct three-dimensional spatial characteristics, in which the state of the atmosphere changes rapidly, the time dimension is also very important, and the variables that describe various physical and chemical states together constitute the earth science cube. GIS, scientific computation and visualization tools are important to find and extract patterns and scientific views behind the data. MeteoInfo open-source software was developed as an integrated framework both for GIS application and scientific computation environment with two applications for end users: MeteoInfoMap and MeteoInfoLab. MeteoInfoMap makes it quick and easy to explore many kinds of geoscience data in the form of GIS layers, and includes spatial data editing, projection and analysis functions. MeteoInfoLab includes multidimensional array calculations, scientific calculations such as linear algebra, and 2D/3D plotting functions, which are suitable for completing the tasks of geoscience data analysis and visualization. The software was developed using Java and Jython, which makes it has good cross-platform capabilities and can run in operating systems such as Windows, Linux/Unix, and Mac OS with Java supporting.

The functions can be conveniently extended through development of plugin for MeteoInfoMap and toolbox for MeteoInfoLab. For example, TrajStat plugin was developed for air trajectory analysis and air pollution source identification, which has been widely used in air pollution transport pathway and spatial sources studies. Several MeteoInfoLab toolbox were also developed for model evaluation (IMEP), air pollution emission data processing (EMIPS) and machine learning (MIML). MeteoInfoLab has similar functions with Python scientific packages such as numpy, pandas and matplotlib, also Jython is just Python in Java. So, the users can learn MeteoInfoLab easily when they have Python experience, vice versa. 3D visualization functions are more powerful in MeteoInfoLab due to the usage of opengl acceleration. Also, 3D earth coordinate is supported to plot geoscience data on virtual earth.

References:

Wang, Y.Q., 2014. MeteoInfo: GIS software for meteorological data visualization and analysis. Meteorological Applications, 21: 360-368.

Wang, Y.Q., 2019. An Open Source Software Suite for Multi-Dimensional Meteorological Data Computation and Visualisation. Journal of Open Research Software, 7(1), p.21. DOI: http://doi.org/10.5334/jors.267

Wang, Y.Q., Zhang, X.Y. and Draxler, R., 2009. TrajStat: GIS-based software that uses various trajectory statistical analysis methods to identify potential sources from long-term air pollution measurement data. Environmental Modelling & Software, 24: 938-939

How to cite: Wang, Y.: MeteoInfo: An open-source GIS, scientific computation and visualization platform, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3492, https://doi.org/10.5194/egusphere-egu22-3492, 2022.

EGU22-12167 | Presentations | ESSI2.9

Brokering approach based implementation for the national hydrological and meteorological information system in Italy

Enrico Boldrini, Roberto Roncella, Fabrizio Papeschi, Paolo Mazzetti, Marco Casaioli, Stefano Mariani, Martina Bussettini, Barbara Lastoria, and Silvano Pecora

The Italian national hydrological and meteorological information system is being operationally implemented by Regional Hydrological Services (SIR), under the coordination of the Italian Institute for Environmental Protection and Research (ISPRA) and the collaboration of National Research Council of Italy, Institute of Atmospheric Pollution Research (CNR-IIA) for architectural design and implementation of the central components and the National Institute for Nuclear Physics (INFN) as cloud computing service developer and infrastructure provider. This work is funded by the Italian Ministry of Ecological Transition under the national initiative “Piano Operativo Ambiente FSC 2014-2020” with the aim of providing a standardised and uniform access to hydro-meteorological data of Italy, allowing, among the others, calculating statistics, trends and indicators related to the hydrological cycle, weather and climate and water resources at national and sub-national (e.g., river basin districts, catchments, climatic areas) scales. A prototype of the system has been developed in the framework of the Italian National Board for Hydrological Operational Services, coordinated by ISPRA, which federates the Italian SIRs that are responsible for hydro-meteorological monitoring at local level. 

A hydrometeorological web portal will be the entry point for end users such as Institutional bodies, research institutions and universities to discover, access and download hydrological and meteorological data of Italy made available by the SIR services.  

Each SIR is already publishing online hydrological and meteorological regional data by means of Internet services. As a requirement, no obligation can be imposed on the specific communication protocols (and related data models) that will be implemented by such services, as the entry barrier for SIR to participate in the system of systems should be minimal.

CNR-IIA is responsible for the design of the architecture, which will be based on a brokering approach to enable interoperability amongst the heterogeneous systems. CNR-IIA is also responsible for the implementation of both the brokering component, based on the Discovery and Access Broker (DAB) technology and the hydrometeorological web portal.

The DAB is a software framework able to implement discovery and access functionalities across a distributed set of heterogeneous data publication systems, in a transparent way for the end user, by acting as a mediator of communication protocols and related data models.

Other service interfaces published by the brokering component will be used by different end-user tools and applications, enabling as well sharing of hydrological and meteorological data of Italy towards different national and international initiatives, in particular the WMO Hydrological Observing System (WHOS).

The brokering component will be deployed and managed operatively on a cloud infrastructure to optimize overall system performance and resource usage.

The system will be initially hosted on the CNR-IIA cloud infrastructure backed by Amazon Web Services (AWS), while the target hosting infrastructure and the related cloud computing services, to be ready for operative use by the end of 2025, will be provided by INFN.

How to cite: Boldrini, E., Roncella, R., Papeschi, F., Mazzetti, P., Casaioli, M., Mariani, S., Bussettini, M., Lastoria, B., and Pecora, S.: Brokering approach based implementation for the national hydrological and meteorological information system in Italy, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12167, https://doi.org/10.5194/egusphere-egu22-12167, 2022.