Table of Contents
Fetching ...

Hephaestus Minicubes: A Global, Multi-Modal Dataset for Volcanic Unrest Monitoring

Nikolas Papadopoulos, Nikolaos Ioannis Bountos, Maria Sdraka, Andreas Karavias, Ioannis Papoutsis

TL;DR

Hephaestus Minicubes addresses the scarcity of machine-learning-ready, multi-modal InSAR data for volcanic unrest monitoring by introducing a global collection of 38 high-resolution datacubes (approx. 100 m GSD) spanning 44 active volcanoes from 2014 to 2021. Each datacube integrates InSAR phase difference and coherence with LiCSAR-derived topography and ERA5 atmospheric variables, plus expert spatiotemporal deformation annotations and descriptive captions, stored efficiently as compressed Zarr arrays totaling 1.7 TB. The work also provides a benchmark for two core tasks—binary deformation classification and semantic segmentation—under single-timestep and time-series formats, using state-of-the-art architectures and evaluating input contributions from atmospheric data and temporal context. Results show strong baseline performance but reveal challenges in time-series and atmospheric-information integration, underscoring the need for context-aware models and improved handling of atmospheric artifacts. Overall, this dataset unlocks new avenues for data-driven volcanic monitoring and fosters broader adoption of multi-modal deep learning in Earth science applications.

Abstract

Ground deformation is regarded in volcanology as a key precursor signal preceding volcanic eruptions. Satellite-based Interferometric Synthetic Aperture Radar (InSAR) enables consistent, global-scale deformation tracking; however, deep learning methods remain largely unexplored in this domain, mainly due to the lack of a curated machine learning dataset. In this work, we build on the existing Hephaestus dataset, and introduce Hephaestus Minicubes, a global collection of 38 spatiotemporal datacubes offering high resolution, multi-source and multi-temporal information, covering 44 of the world's most active volcanoes over a 7-year period. Each spatiotemporal datacube integrates InSAR products, topographic data, as well as atmospheric variables which are known to introduce signal delays that can mimic ground deformation in InSAR imagery. Furthermore, we provide expert annotations detailing the type, intensity and spatial extent of deformation events, along with rich text descriptions of the observed scenes. Finally, we present a comprehensive benchmark, demonstrating Hephaestus Minicubes' ability to support volcanic unrest monitoring as a multi-modal, multi-temporal classification and semantic segmentation task, establishing strong baselines with state-of-the-art architectures. This work aims to advance machine learning research in volcanic monitoring, contributing to the growing integration of data-driven methods within Earth science applications.

Hephaestus Minicubes: A Global, Multi-Modal Dataset for Volcanic Unrest Monitoring

TL;DR

Hephaestus Minicubes addresses the scarcity of machine-learning-ready, multi-modal InSAR data for volcanic unrest monitoring by introducing a global collection of 38 high-resolution datacubes (approx. 100 m GSD) spanning 44 active volcanoes from 2014 to 2021. Each datacube integrates InSAR phase difference and coherence with LiCSAR-derived topography and ERA5 atmospheric variables, plus expert spatiotemporal deformation annotations and descriptive captions, stored efficiently as compressed Zarr arrays totaling 1.7 TB. The work also provides a benchmark for two core tasks—binary deformation classification and semantic segmentation—under single-timestep and time-series formats, using state-of-the-art architectures and evaluating input contributions from atmospheric data and temporal context. Results show strong baseline performance but reveal challenges in time-series and atmospheric-information integration, underscoring the need for context-aware models and improved handling of atmospheric artifacts. Overall, this dataset unlocks new avenues for data-driven volcanic monitoring and fosters broader adoption of multi-modal deep learning in Earth science applications.

Abstract

Ground deformation is regarded in volcanology as a key precursor signal preceding volcanic eruptions. Satellite-based Interferometric Synthetic Aperture Radar (InSAR) enables consistent, global-scale deformation tracking; however, deep learning methods remain largely unexplored in this domain, mainly due to the lack of a curated machine learning dataset. In this work, we build on the existing Hephaestus dataset, and introduce Hephaestus Minicubes, a global collection of 38 spatiotemporal datacubes offering high resolution, multi-source and multi-temporal information, covering 44 of the world's most active volcanoes over a 7-year period. Each spatiotemporal datacube integrates InSAR products, topographic data, as well as atmospheric variables which are known to introduce signal delays that can mimic ground deformation in InSAR imagery. Furthermore, we provide expert annotations detailing the type, intensity and spatial extent of deformation events, along with rich text descriptions of the observed scenes. Finally, we present a comprehensive benchmark, demonstrating Hephaestus Minicubes' ability to support volcanic unrest monitoring as a multi-modal, multi-temporal classification and semantic segmentation task, establishing strong baselines with state-of-the-art architectures. This work aims to advance machine learning research in volcanic monitoring, contributing to the growing integration of data-driven methods within Earth science applications.

Paper Structure

This paper contains 31 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Hephaestus Minicubes data sources (left) vis-à-vis the spatial distribution of the minicubes (right). Box sizes on the map are proportional to frame dimensions and color intensity reflects the number of available products per region.
  • Figure 2: Examples of the different ground deformation types available in Hephaestus Minicubes.
  • Figure 3: Schematic representation of the time-series construction method. A single primary acquisition date is associated with multiple secondary dates, forming a sequence of InSAR products. Each image displays the phase difference with an overlaid mask to highlight areas with apparent deformation.
  • Figure 4: Compact view of DeepLabV3 predictions (top) with and without atmospheric input and the associated mean lateral gradient distributions of TCWV (bottom) for two samples: (a) Sierra Negra volcano, Galápagos islands. (15/3/2020 - 15/04/2020), (b) Valle de Piedras Encimadas region in Puebla, Mexico (05-08-2020 - 17/08/2020). We examine, representative examples where the inclusion of atmospheric variables leads to improved segmentation performance by mitigating false positives linked to atmospheric artifacts. In both cases, this improvement coincides with high lateral variation in TCWV, hinting at the potential value of atmospheric variables.
  • Figure 5: Textual annotations highlighting volcanic and atmospheric phenomena in InSAR imagery from the Hephaestus Minicubes dataset.
  • ...and 4 more figures