Table of Contents
Fetching ...

Earth System Data Cubes: Avenues for advancing Earth system research

David Montero, Guido Kraemer, Anca Anghelea, César Aybar, Gunnar Brandt, Gustau Camps-Valls, Felix Cremer, Ida Flik, Fabian Gans, Sarah Habershon, Chaonan Ji, Teja Kattenborn, Laura Martínez-Ferrer, Francesco Martinuzzi, Martin Reinhardt, Maximilian Söchting, Khalil Teber, Miguel D. Mahecha

TL;DR

Earth System Data Cubes (ESDCs) provide a unified, spatio-temporal grid framework to transform heterogeneous, high-resolution Earth data into analysis-ready streams. The paper outlines the ESDC life cycle (collection, curation, cubing, harmonisation, transformation, reuse, metadata generation) and addresses key challenges such as geometric distortions, spatio-temporal representativeness, interoperability, and scalability. It then discusses how ESDCs enable AI-driven Earth system research through physics-informed machine learning, strategic sampling, and uncertainty quantification, alongside technical considerations for computing resources, software ecosystems, and visualization. The authors argue for standardized, transparent provenance and FAIR, cloud-enabled practices to fully exploit ESDCs for data-driven Earth system science and for broader stakeholder engagement.

Abstract

Recent advancements in Earth system science have been marked by the exponential increase in the availability of diverse, multivariate datasets characterised by moderate to high spatio-temporal resolutions. Earth System Data Cubes (ESDCs) have emerged as one suitable solution for transforming this flood of data into a simple yet robust data structure. ESDCs achieve this by organising data into an analysis-ready format aligned with a spatio-temporal grid, facilitating user-friendly analysis and diminishing the need for extensive technical data processing knowledge. Despite these significant benefits, the completion of the entire ESDC life cycle remains a challenging task. Obstacles are not only of a technical nature but also relate to domain-specific problems in Earth system research. There exist barriers to realising the full potential of data collections in light of novel cloud-based technologies, particularly in curating data tailored for specific application domains. These include transforming data to conform to a spatio-temporal grid with minimum distortions and managing complexities such as spatio-temporal autocorrelation issues. Addressing these challenges is pivotal for the effective application of Artificial Intelligence (AI) approaches. Furthermore, adhering to open science principles for data dissemination, reproducibility, visualisation, and reuse is crucial for fostering sustainable research. Overcoming these challenges offers a substantial opportunity to advance data-driven Earth system research, unlocking the full potential of an integrated, multidimensional view of Earth system processes. This is particularly true when such research is coupled with innovative research paradigms and technological progress.

Earth System Data Cubes: Avenues for advancing Earth system research

TL;DR

Earth System Data Cubes (ESDCs) provide a unified, spatio-temporal grid framework to transform heterogeneous, high-resolution Earth data into analysis-ready streams. The paper outlines the ESDC life cycle (collection, curation, cubing, harmonisation, transformation, reuse, metadata generation) and addresses key challenges such as geometric distortions, spatio-temporal representativeness, interoperability, and scalability. It then discusses how ESDCs enable AI-driven Earth system research through physics-informed machine learning, strategic sampling, and uncertainty quantification, alongside technical considerations for computing resources, software ecosystems, and visualization. The authors argue for standardized, transparent provenance and FAIR, cloud-enabled practices to fully exploit ESDCs for data-driven Earth system science and for broader stakeholder engagement.

Abstract

Recent advancements in Earth system science have been marked by the exponential increase in the availability of diverse, multivariate datasets characterised by moderate to high spatio-temporal resolutions. Earth System Data Cubes (ESDCs) have emerged as one suitable solution for transforming this flood of data into a simple yet robust data structure. ESDCs achieve this by organising data into an analysis-ready format aligned with a spatio-temporal grid, facilitating user-friendly analysis and diminishing the need for extensive technical data processing knowledge. Despite these significant benefits, the completion of the entire ESDC life cycle remains a challenging task. Obstacles are not only of a technical nature but also relate to domain-specific problems in Earth system research. There exist barriers to realising the full potential of data collections in light of novel cloud-based technologies, particularly in curating data tailored for specific application domains. These include transforming data to conform to a spatio-temporal grid with minimum distortions and managing complexities such as spatio-temporal autocorrelation issues. Addressing these challenges is pivotal for the effective application of Artificial Intelligence (AI) approaches. Furthermore, adhering to open science principles for data dissemination, reproducibility, visualisation, and reuse is crucial for fostering sustainable research. Overcoming these challenges offers a substantial opportunity to advance data-driven Earth system research, unlocking the full potential of an integrated, multidimensional view of Earth system processes. This is particularly true when such research is coupled with innovative research paradigms and technological progress.
Paper Structure (25 sections, 7 figures)

This paper contains 25 sections, 7 figures.

Figures (7)

  • Figure 1: Representations of different storage systems for gridded data in Earth system research: Image collections (left), information-preserving data cubes (centre), and Earth system data cubes (ESDCs, right). Differences in these abstract representations have deep implications for data storage systems, accessibility, interoperability and metadata definitions
  • Figure 2: ESDC life cycle. The inner circle represents data processing tasks, and the outer circles represent ancillary tasks that run parallel to the processing steps, involving activities such as data exploration, visualisation, dissemination, and metadata generation. The outermost circle of the diagram illustrates the readiness level of the processed ESDCs at specific points within the cycle
  • Figure 3: Abstract representation illustrating the connection between three Earth system variables in a hARDC+ (from top to bottom: anomalies in air temperature, soil moisture, and gross primary production). The arrows illustrate the interactions that can be modelled, e.g., predictive modelling (top to bottom) or interpretation (bottom to top), depending on the use case of interest
  • Figure 4: Abstract representation illustrating the process of sampling high-resolution mini cubes for further analysis by considering vegetation land covers and extreme events detected via a global ESDC. Note that sample mini cubes are specified in the spatial and temporal ranges of the detected extreme events (also considering their occurrence)
  • Figure 5: Comparison of air temperature at 2 m from ERA5 with and without weighting on the global mean time series computation. This rather trivial example shows how radically wrong any computation can be if the spherical nature of planet Earth is ignored
  • ...and 2 more figures