Table of Contents
Fetching ...

SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping

Renjie Ji, Xue Wang, Chao Niu, Wen Zhang, Yong Mei, Kun Tan

TL;DR

SpecAware tackles the cross-sensor heterogeneity of hyperspectral imagery by introducing a meta-content aware encoder and a HyperEmbedding hypernetwork that dynamically generates channel-wise, low-rank embeddings conditioned on sensor metadata and image content. Coupled with a hypernetwork-based decoder and a robust reconstruction loss, the framework enables unified multi-sensor pretraining on a new Hyper-400K AVIRIS-based dataset, using progressive, multi-view training to optimize efficiency. Across land-cover segmentation, change detection, and scene classification, SpecAware delivers state-of-the-art or near-state-of-the-art results and demonstrates strong cross-sensor transfer with lightweight, scalable components. The work provides a practical foundation and dataset for advancing HSI foundation models in multi-sensor airborne applications, with significant implications for scalable, high-resolution LULC interpretation.

Abstract

Hyperspectral imaging (HSI) is a vital tool for fine-grained land-use and land-cover (LULC) mapping. However, the inherent heterogeneity of HSI data has long posed a major barrier to developing generalized models via joint training. Although HSI foundation models have shown promise for different downstream tasks, the existing approaches typically overlook the critical guiding role of sensor meta-attributes, and struggle with multi-sensor training, limiting their transferability. To address these challenges, we propose SpecAware, which is a novel hyperspectral spectral-content aware foundation model for unifying multi-sensor learning for HSI mapping. We also constructed the Hyper-400K dataset to facilitate this research, which is a new large-scale, high-quality benchmark dataset with over 400k image patches from diverse airborne AVIRIS sensors. The core of SpecAware is a two-step hypernetwork-driven encoding process for HSI data. Firstly, we designed a meta-content aware module to generate a unique conditional input for each HSI patch, tailored to each spectral band of every sample by fusing the sensor meta-attributes and its own image content. Secondly, we designed the HyperEmbedding module, where a sample-conditioned hypernetwork dynamically generates a pair of matrix factors for channel-wise encoding, consisting of adaptive spatial pattern extraction and latent semantic feature re-projection. Thus, SpecAware gains the ability to perceive and interpret spatial-spectral features across diverse scenes and sensors. This, in turn, allows SpecAware to adaptively process a variable number of spectral channels, establishing a unified framework for joint pre-training. Extensive experiments on six datasets demonstrate that SpecAware can learn superior feature representations, excelling in land-cover semantic segmentation classification, change detection, and scene classification.

SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping

TL;DR

SpecAware tackles the cross-sensor heterogeneity of hyperspectral imagery by introducing a meta-content aware encoder and a HyperEmbedding hypernetwork that dynamically generates channel-wise, low-rank embeddings conditioned on sensor metadata and image content. Coupled with a hypernetwork-based decoder and a robust reconstruction loss, the framework enables unified multi-sensor pretraining on a new Hyper-400K AVIRIS-based dataset, using progressive, multi-view training to optimize efficiency. Across land-cover segmentation, change detection, and scene classification, SpecAware delivers state-of-the-art or near-state-of-the-art results and demonstrates strong cross-sensor transfer with lightweight, scalable components. The work provides a practical foundation and dataset for advancing HSI foundation models in multi-sensor airborne applications, with significant implications for scalable, high-resolution LULC interpretation.

Abstract

Hyperspectral imaging (HSI) is a vital tool for fine-grained land-use and land-cover (LULC) mapping. However, the inherent heterogeneity of HSI data has long posed a major barrier to developing generalized models via joint training. Although HSI foundation models have shown promise for different downstream tasks, the existing approaches typically overlook the critical guiding role of sensor meta-attributes, and struggle with multi-sensor training, limiting their transferability. To address these challenges, we propose SpecAware, which is a novel hyperspectral spectral-content aware foundation model for unifying multi-sensor learning for HSI mapping. We also constructed the Hyper-400K dataset to facilitate this research, which is a new large-scale, high-quality benchmark dataset with over 400k image patches from diverse airborne AVIRIS sensors. The core of SpecAware is a two-step hypernetwork-driven encoding process for HSI data. Firstly, we designed a meta-content aware module to generate a unique conditional input for each HSI patch, tailored to each spectral band of every sample by fusing the sensor meta-attributes and its own image content. Secondly, we designed the HyperEmbedding module, where a sample-conditioned hypernetwork dynamically generates a pair of matrix factors for channel-wise encoding, consisting of adaptive spatial pattern extraction and latent semantic feature re-projection. Thus, SpecAware gains the ability to perceive and interpret spatial-spectral features across diverse scenes and sensors. This, in turn, allows SpecAware to adaptively process a variable number of spectral channels, establishing a unified framework for joint pre-training. Extensive experiments on six datasets demonstrate that SpecAware can learn superior feature representations, excelling in land-cover semantic segmentation classification, change detection, and scene classification.

Paper Structure

This paper contains 29 sections, 26 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: The diversity of hyperspectral sensors and data scenes.
  • Figure 2: Architectural comparison between a conventional neural network (a) and a hypernetwork (b).
  • Figure 3: Overview of the SpecAware pre-training framework. The framework starts with a large, diverse multi-sensor and multi-level aerial HSI dataset. The SpecAware model then uses a meta-content aware encoder and hypernetworks to perform channel-wise encoding of the spatial-spectral features, yielding a final HSI token. This model was trained with a progressive scheme from a multi-view strategy and evaluated on three downstream tasks.
  • Figure 4: The sensor meta-attribute encoding module (a) and the image content feature-aware encoding module (b).
  • Figure 5: Architecture of the HyperEmbedding module. The module employs hypernetworks that take fused features as input to dynamically generate matrix parameters for adaptive spatial pattern extraction and latent semantic feature re-projection.
  • ...and 10 more figures