HyperspectralViTs: General Hyperspectral Models for On-board Remote Sensing
Vít Růžička, Andrew Markham
TL;DR
The paper tackles the challenge of on-board hyperspectral data processing by introducing HyperspectralViTs, transformer-based architectures adapted for high spectral dimensionality in low-compute environments. The authors propose three modular enhancements—spectral 1×1 conv layers, an Upscale decoder, and adjusted initial stride—implemented in HyperSegFormer and HyperEfficientViT to enable end-to-end semantic segmentation without relying on hand-crafted spectral products. Across synthetic and real methane datasets, and a mineral-identification dataset, the methods yield significant performance gains (e.g., F1 improvements of up to 27% on synthetic methane and 13% on a STARCOP benchmark) and substantial inference-speedups (up to 85% faster on constrained hardware). They also demonstrate that pre-training on synthetic data improves performance when fine-tuned on real events, and release three OxHyper datasets and code to support future hyperspectral foundation-model research. The work has practical impact by enabling autonomous, fast, in-space detection and decision-making for methane leaks and mineral mapping, and paves the way for broader foundation-model development in hyperspectral sensing.
Abstract
On-board processing of hyperspectral data with machine learning models would enable unprecedented amount of autonomy for a wide range of tasks, for example methane detection or mineral identification. This can enable early warning system and could allow new capabilities such as automated scheduling across constellations of satellites. Classical methods suffer from high false positive rates and previous deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support end-to-end training with data of high spectral dimension without relying on hand-crafted products or spectral band compression preprocessing. We evaluate our models on two tasks related to hyperspectral data processing. With our proposed general architectures, we improve the F1 score of the previous methane detection state-of-the-art models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. We also demonstrate that training models on the synthetic dataset improves performance of models finetuned on the dataset of real events by 6.9% in F1 score in contrast with training from scratch. On a newly created dataset for mineral identification, our models provide 3.5% improvement in the F1 score in contrast to the default versions of the models. With our proposed models we improve the inference speed by 85% in contrast to previous classical and deep learning approaches by removing the dependency on classically computed features. With our architecture, one capture from the EMIT sensor can be processed within 30 seconds on realistic proxy of the ION-SCV 004 satellite.
