Table of Contents
Fetching ...

Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification

Xizhe Xue, Haokui Zhang, Haizhao Jing, Lijie Tao, Zongwen Bai, Ying Li

TL;DR

The paper tackles the challenge of data-scarce hyperspectral image classification and cross-sensor domain gaps by introducing Attention-Gated Tuning (AGT) and a triplet-structured transformer, Tri-Former. AGT leverages a lightweight auxiliary branch and a cross-attention gate to selectively fuse source-domain knowledge with target-domain adaptation, while using asynchronous cold-hot gradient updates to balance retention and adaptation. Tri-Former combines a spectral-spatial parallel design with a 3D convolutional stage to improve efficiency and learning from limited labeled data. Empirical results across multiple sensors and even cross-modal RGB-to-HSI settings show that Tri-Former with AGT outperforms state-of-the-art methods in accuracy and inference speed, validating the method’s effectiveness for scalable, cross-domain HSI classification.

Abstract

Data-hungry HSI classification methods require high-quality labeled HSIs, which are often costly to obtain. This characteristic limits the performance potential of data-driven methods when dealing with limited annotated samples. Bridging the domain gap between data acquired from different sensors allows us to utilize abundant labeled data across sensors to break this bottleneck. In this paper, we propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue. The AGT strategy serves as a bridge, allowing us to leverage existing labeled HSI datasets, even RGB datasets to enhance the performance on new HSI datasets with limited samples. Instead of inserting additional parameters inside the basic model, we train a lightweight auxiliary branch that takes intermediate features as input from the basic model and makes predictions. The proposed AGT resolves conflicts between heterogeneous and even cross-modal data by suppressing the disturbing information and enhances the useful information through a soft gate. Additionally, we introduce Tri-Former, a triplet-structured transformer with a spectral-spatial separation design that enhances parameter utilization and computational efficiency, enabling easier and flexible fine-tuning. Comparison experiments conducted on three representative HSI datasets captured by different sensors demonstrate the proposed Tri-Former achieves better performance compared to several state-of-the-art methods. Homologous, heterologous and cross-modal tuning experiments verified the effectiveness of the proposed AGT. Code has been released at: \href{https://github.com/Cecilia-xue/AGT}{https://github.com/Cecilia-xue/AGT}.

Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification

TL;DR

The paper tackles the challenge of data-scarce hyperspectral image classification and cross-sensor domain gaps by introducing Attention-Gated Tuning (AGT) and a triplet-structured transformer, Tri-Former. AGT leverages a lightweight auxiliary branch and a cross-attention gate to selectively fuse source-domain knowledge with target-domain adaptation, while using asynchronous cold-hot gradient updates to balance retention and adaptation. Tri-Former combines a spectral-spatial parallel design with a 3D convolutional stage to improve efficiency and learning from limited labeled data. Empirical results across multiple sensors and even cross-modal RGB-to-HSI settings show that Tri-Former with AGT outperforms state-of-the-art methods in accuracy and inference speed, validating the method’s effectiveness for scalable, cross-domain HSI classification.

Abstract

Data-hungry HSI classification methods require high-quality labeled HSIs, which are often costly to obtain. This characteristic limits the performance potential of data-driven methods when dealing with limited annotated samples. Bridging the domain gap between data acquired from different sensors allows us to utilize abundant labeled data across sensors to break this bottleneck. In this paper, we propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue. The AGT strategy serves as a bridge, allowing us to leverage existing labeled HSI datasets, even RGB datasets to enhance the performance on new HSI datasets with limited samples. Instead of inserting additional parameters inside the basic model, we train a lightweight auxiliary branch that takes intermediate features as input from the basic model and makes predictions. The proposed AGT resolves conflicts between heterogeneous and even cross-modal data by suppressing the disturbing information and enhances the useful information through a soft gate. Additionally, we introduce Tri-Former, a triplet-structured transformer with a spectral-spatial separation design that enhances parameter utilization and computational efficiency, enabling easier and flexible fine-tuning. Comparison experiments conducted on three representative HSI datasets captured by different sensors demonstrate the proposed Tri-Former achieves better performance compared to several state-of-the-art methods. Homologous, heterologous and cross-modal tuning experiments verified the effectiveness of the proposed AGT. Code has been released at: \href{https://github.com/Cecilia-xue/AGT}{https://github.com/Cecilia-xue/AGT}.
Paper Structure (26 sections, 18 equations, 17 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 18 equations, 17 figures, 12 tables, 1 algorithm.

Figures (17)

  • Figure 1: Comparison of different fine tuning architectures. (a) FPFT; (b) Adapter tuning; (c) LoRA; (d) Prompt tuning; (e) Ladder side tuning.
  • Figure 2: Architecture of proposed Tri-Former. SSP block has three main parts: spectral module, spatial module and 3D convolution module. Spectral module is responsible for collecting information from different spectrum bands. Spatial module is in charge of collecting information from different space locations. 3D convolution layer is added before the two linear layers to enforce 3D structure information and stabilize training process.
  • Figure 3: Comparison between proposed SSP block and vanilla ViT block. (a) SSP block; (b) Vanilla ViT block.
  • Figure 4: Architecture of proposed AGT. Blue part is the basic model, where big patches and heavy Tri-Former are adopted. Red part is an auxiliary branch, where a small patch and tiny Tri-Former are used.
  • Figure 5: Internal structure of proposed attention gate.
  • ...and 12 more figures