Hephaestus Minicubes: A Global, Multi-Modal Dataset for Volcanic Unrest Monitoring
Nikolas Papadopoulos, Nikolaos Ioannis Bountos, Maria Sdraka, Andreas Karavias, Ioannis Papoutsis
TL;DR
Hephaestus Minicubes addresses the scarcity of machine-learning-ready, multi-modal InSAR data for volcanic unrest monitoring by introducing a global collection of 38 high-resolution datacubes (approx. 100 m GSD) spanning 44 active volcanoes from 2014 to 2021. Each datacube integrates InSAR phase difference and coherence with LiCSAR-derived topography and ERA5 atmospheric variables, plus expert spatiotemporal deformation annotations and descriptive captions, stored efficiently as compressed Zarr arrays totaling 1.7 TB. The work also provides a benchmark for two core tasks—binary deformation classification and semantic segmentation—under single-timestep and time-series formats, using state-of-the-art architectures and evaluating input contributions from atmospheric data and temporal context. Results show strong baseline performance but reveal challenges in time-series and atmospheric-information integration, underscoring the need for context-aware models and improved handling of atmospheric artifacts. Overall, this dataset unlocks new avenues for data-driven volcanic monitoring and fosters broader adoption of multi-modal deep learning in Earth science applications.
Abstract
Ground deformation is regarded in volcanology as a key precursor signal preceding volcanic eruptions. Satellite-based Interferometric Synthetic Aperture Radar (InSAR) enables consistent, global-scale deformation tracking; however, deep learning methods remain largely unexplored in this domain, mainly due to the lack of a curated machine learning dataset. In this work, we build on the existing Hephaestus dataset, and introduce Hephaestus Minicubes, a global collection of 38 spatiotemporal datacubes offering high resolution, multi-source and multi-temporal information, covering 44 of the world's most active volcanoes over a 7-year period. Each spatiotemporal datacube integrates InSAR products, topographic data, as well as atmospheric variables which are known to introduce signal delays that can mimic ground deformation in InSAR imagery. Furthermore, we provide expert annotations detailing the type, intensity and spatial extent of deformation events, along with rich text descriptions of the observed scenes. Finally, we present a comprehensive benchmark, demonstrating Hephaestus Minicubes' ability to support volcanic unrest monitoring as a multi-modal, multi-temporal classification and semantic segmentation task, establishing strong baselines with state-of-the-art architectures. This work aims to advance machine learning research in volcanic monitoring, contributing to the growing integration of data-driven methods within Earth science applications.
