Table of Contents
Fetching ...

HyperspectralViTs: General Hyperspectral Models for On-board Remote Sensing

Vít Růžička, Andrew Markham

TL;DR

The paper tackles the challenge of on-board hyperspectral data processing by introducing HyperspectralViTs, transformer-based architectures adapted for high spectral dimensionality in low-compute environments. The authors propose three modular enhancements—spectral 1×1 conv layers, an Upscale decoder, and adjusted initial stride—implemented in HyperSegFormer and HyperEfficientViT to enable end-to-end semantic segmentation without relying on hand-crafted spectral products. Across synthetic and real methane datasets, and a mineral-identification dataset, the methods yield significant performance gains (e.g., F1 improvements of up to 27% on synthetic methane and 13% on a STARCOP benchmark) and substantial inference-speedups (up to 85% faster on constrained hardware). They also demonstrate that pre-training on synthetic data improves performance when fine-tuned on real events, and release three OxHyper datasets and code to support future hyperspectral foundation-model research. The work has practical impact by enabling autonomous, fast, in-space detection and decision-making for methane leaks and mineral mapping, and paves the way for broader foundation-model development in hyperspectral sensing.

Abstract

On-board processing of hyperspectral data with machine learning models would enable unprecedented amount of autonomy for a wide range of tasks, for example methane detection or mineral identification. This can enable early warning system and could allow new capabilities such as automated scheduling across constellations of satellites. Classical methods suffer from high false positive rates and previous deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support end-to-end training with data of high spectral dimension without relying on hand-crafted products or spectral band compression preprocessing. We evaluate our models on two tasks related to hyperspectral data processing. With our proposed general architectures, we improve the F1 score of the previous methane detection state-of-the-art models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. We also demonstrate that training models on the synthetic dataset improves performance of models finetuned on the dataset of real events by 6.9% in F1 score in contrast with training from scratch. On a newly created dataset for mineral identification, our models provide 3.5% improvement in the F1 score in contrast to the default versions of the models. With our proposed models we improve the inference speed by 85% in contrast to previous classical and deep learning approaches by removing the dependency on classically computed features. With our architecture, one capture from the EMIT sensor can be processed within 30 seconds on realistic proxy of the ION-SCV 004 satellite.

HyperspectralViTs: General Hyperspectral Models for On-board Remote Sensing

TL;DR

The paper tackles the challenge of on-board hyperspectral data processing by introducing HyperspectralViTs, transformer-based architectures adapted for high spectral dimensionality in low-compute environments. The authors propose three modular enhancements—spectral 1×1 conv layers, an Upscale decoder, and adjusted initial stride—implemented in HyperSegFormer and HyperEfficientViT to enable end-to-end semantic segmentation without relying on hand-crafted spectral products. Across synthetic and real methane datasets, and a mineral-identification dataset, the methods yield significant performance gains (e.g., F1 improvements of up to 27% on synthetic methane and 13% on a STARCOP benchmark) and substantial inference-speedups (up to 85% faster on constrained hardware). They also demonstrate that pre-training on synthetic data improves performance when fine-tuned on real events, and release three OxHyper datasets and code to support future hyperspectral foundation-model research. The work has practical impact by enabling autonomous, fast, in-space detection and decision-making for methane leaks and mineral mapping, and paves the way for broader foundation-model development in hyperspectral sensing.

Abstract

On-board processing of hyperspectral data with machine learning models would enable unprecedented amount of autonomy for a wide range of tasks, for example methane detection or mineral identification. This can enable early warning system and could allow new capabilities such as automated scheduling across constellations of satellites. Classical methods suffer from high false positive rates and previous deep learning models exhibit prohibitive computational requirements. We propose fast and accurate machine learning architectures which support end-to-end training with data of high spectral dimension without relying on hand-crafted products or spectral band compression preprocessing. We evaluate our models on two tasks related to hyperspectral data processing. With our proposed general architectures, we improve the F1 score of the previous methane detection state-of-the-art models by 27% on a newly created synthetic dataset and by 13% on the previously released large benchmark dataset. We also demonstrate that training models on the synthetic dataset improves performance of models finetuned on the dataset of real events by 6.9% in F1 score in contrast with training from scratch. On a newly created dataset for mineral identification, our models provide 3.5% improvement in the F1 score in contrast to the default versions of the models. With our proposed models we improve the inference speed by 85% in contrast to previous classical and deep learning approaches by removing the dependency on classically computed features. With our architecture, one capture from the EMIT sensor can be processed within 30 seconds on realistic proxy of the ION-SCV 004 satellite.

Paper Structure

This paper contains 19 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of the limitations of previous approaches. First, dependence on classically used products computed from hyperspectral data (such as matched filters in STARCOP) may lead to loss of accuracy due to imperfect capture of the events of interest. Second, off-the-shelf versions of machine learning models aren’t adapted to hyperspectral data and may lose valuable information in the early layers of the models. Finally, our adapted model leverages information from all relevant hyperspectral bands, which leads to improvements in both accuracy and inference speeds of our method.
  • Figure 2: Methane gas signature (shown through the methane transmittance) in comparison with the band ranges of typically used multispectral and hyperspectral satellites.
  • Figure 3: Locations of tiles used to create the OxHyper datasets of EMIT data. We note that the same source tiles (including the same dataset splits) are used for the minerals (where they contain more spectral bands) and the synthetic methane datasets (where the methane leak events are added into the clean datacubes).
  • Figure 4: Aggregation of mineral components into mineral classes. From left to right we show a RGB visualisation of the scene, binary mask of three common components for Hematite, the aggregated Hematite product (using in total 14 components) and visualisations of a 3-minerals composites. We show the "Goethite", "Hematite" and "Kaolinite" composite and the binary maps used for training (in the last column).
  • Figure 5: Illustration of the Hyper SegFormer model for semantic segmentation of methane plumes in Hyperspectral data. We highlight in blue the three proposed modular adjustments: 1.) Spectral layers (denoted as "1x1 Conv") in the Transformer blocks, 2.) Upscaling layer in the decoder network (denoted as "Upscale layer") and finally 3.) adjusting the stride of the first Transformer block to 2 (denoted as "Stride"). In red we show an example of the progressive decimation of the resolution throughout the model (using all three adjustments) - we highlight that the typical off-the-shelf variants of the models reduce the output resolution by a factor of 4.
  • ...and 4 more figures