Table of Contents
Fetching ...

Spectral Gaps and Spatial Priors: Studying Hyperspectral Downstream Adaptation Using TerraMind

Julia Anna Leonardi, Johannes Jakubik, Paolo Fraccaro, Maria Antonia Brovelli

TL;DR

The findings of this research establish a critical baseline for HSI integration, motivating the need for native spectral tokenization in future multimodal model architectures.

Abstract

Geospatial Foundation Models (GFMs) typically lack native support for Hyperspectral Imaging (HSI) due to the complexity and sheer size of high-dimensional spectral data. This study investigates the adaptability of TerraMind, a multimodal GFM, to address HSI downstream tasks \emph{without} HSI-specific pretraining. Therefore, we implement and compare two channel adaptation strategies: Naive Band Selection and physics-aware Spectral Response Function (SRF) grouping. Overall, our results indicate a general superiority of deep learning models with native support of HSI data. Our experiments also demonstrate the ability of TerraMind to adapt to HSI downstream tasks through band selection with moderate performance decline. Therefore, the findings of this research establish a critical baseline for HSI integration, motivating the need for native spectral tokenization in future multimodal model architectures.

Spectral Gaps and Spatial Priors: Studying Hyperspectral Downstream Adaptation Using TerraMind

TL;DR

The findings of this research establish a critical baseline for HSI integration, motivating the need for native spectral tokenization in future multimodal model architectures.

Abstract

Geospatial Foundation Models (GFMs) typically lack native support for Hyperspectral Imaging (HSI) due to the complexity and sheer size of high-dimensional spectral data. This study investigates the adaptability of TerraMind, a multimodal GFM, to address HSI downstream tasks \emph{without} HSI-specific pretraining. Therefore, we implement and compare two channel adaptation strategies: Naive Band Selection and physics-aware Spectral Response Function (SRF) grouping. Overall, our results indicate a general superiority of deep learning models with native support of HSI data. Our experiments also demonstrate the ability of TerraMind to adapt to HSI downstream tasks through band selection with moderate performance decline. Therefore, the findings of this research establish a critical baseline for HSI integration, motivating the need for native spectral tokenization in future multimodal model architectures.
Paper Structure (8 sections, 3 equations, 4 figures, 1 table)

This paper contains 8 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Experimental framework for benchmarking channel adaptation strategies (Naive Selection vs. SRF Grouping) using the TerraMind backbone on four HSI-specific applications.
  • Figure 2: Prediction examples of TerraMind on the EnMAP-BNETD dataset. Compared to SRF Grouping (right), Naive Band Selection (center) better preserves fine spatial structures.
  • Figure 3: Prediction examples of TerraMind on the EnMAP-CDL dataset using both band sampling techniques. Note that the background class was set as the ignore index during training.
  • Figure 4: Prediction examples on EnMAP-BDFORET. The spectral similarity of tree species leads to visible class confusion in the SRF predictions (right), whereas Naive Band Selection (center) achieves better discrimination between pine subtypes. Note the extensive background regions (grey), which were set as the ignore index during training.