Table of Contents
Fetching ...

HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification

Hongxing Peng, Kang Lin, Huanai Liu

TL;DR

Hyperspectral image classification faces a trade-off between capturing fine-grained local details and maintaining global context. The authors introduce HS-Mamba, a full-field interaction framework that processes non-overlapping patches with a dual-channel spatial-spectral encoder (DCSS-encoder) using multi-group Mamba, while a lightweight global inline attention (LGI-Att) branch leverages full-image context. Key contributions include the DCSS-Encoder with cosine positional encoding, the multi-groups Mamba fusion mechanism, the Spe-compressed and Spa-extended attention modules, and comprehensive ablations and efficiency analyses demonstrating SOTA performance on four benchmarks with reduced computational cost. This approach offers a scalable, accurate solution for HSI classification by effectively uniting local inline-feature modeling with global feature enhancement.

Abstract

Hyperspectral image (HSI) classification has been one of the hot topics in remote sensing fields. Recently, the Mamba architecture based on selective state-space models (S6) has demonstrated great advantages in long sequence modeling. However, the unique properties of hyperspectral data, such as high dimensionality and feature inlining, pose challenges to the application of Mamba to HSI classification. To compensate for these shortcomings, we propose an full-field interaction multi-groups Mamba framework (HS-Mamba), which adopts a strategy different from pixel-patch based or whole-image based, but combines the advantages of both. The patches cut from the whole image are sent to multi-groups Mamba, combined with positional information to perceive local inline features in the spatial and spectral domains, and the whole image is sent to a lightweight attention module to enhance the global feature representation ability. Specifically, HS-Mamba consists of a dual-channel spatial-spectral encoder (DCSS-encoder) module and a lightweight global inline attention (LGI-Att) branch. The DCSS-encoder module uses multiple groups of Mamba to decouple and model the local features of dual-channel sequences with non-overlapping patches. The LGI-Att branch uses a lightweight compressed and extended attention module to perceive the global features of the spatial and spectral domains of the unsegmented whole image. By fusing local and global features, high-precision classification of hyperspectral images is achieved. Extensive experiments demonstrate the superiority of the proposed HS-Mamba, outperforming state-of-the-art methods on four benchmark HSI datasets.

HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification

TL;DR

Hyperspectral image classification faces a trade-off between capturing fine-grained local details and maintaining global context. The authors introduce HS-Mamba, a full-field interaction framework that processes non-overlapping patches with a dual-channel spatial-spectral encoder (DCSS-encoder) using multi-group Mamba, while a lightweight global inline attention (LGI-Att) branch leverages full-image context. Key contributions include the DCSS-Encoder with cosine positional encoding, the multi-groups Mamba fusion mechanism, the Spe-compressed and Spa-extended attention modules, and comprehensive ablations and efficiency analyses demonstrating SOTA performance on four benchmarks with reduced computational cost. This approach offers a scalable, accurate solution for HSI classification by effectively uniting local inline-feature modeling with global feature enhancement.

Abstract

Hyperspectral image (HSI) classification has been one of the hot topics in remote sensing fields. Recently, the Mamba architecture based on selective state-space models (S6) has demonstrated great advantages in long sequence modeling. However, the unique properties of hyperspectral data, such as high dimensionality and feature inlining, pose challenges to the application of Mamba to HSI classification. To compensate for these shortcomings, we propose an full-field interaction multi-groups Mamba framework (HS-Mamba), which adopts a strategy different from pixel-patch based or whole-image based, but combines the advantages of both. The patches cut from the whole image are sent to multi-groups Mamba, combined with positional information to perceive local inline features in the spatial and spectral domains, and the whole image is sent to a lightweight attention module to enhance the global feature representation ability. Specifically, HS-Mamba consists of a dual-channel spatial-spectral encoder (DCSS-encoder) module and a lightweight global inline attention (LGI-Att) branch. The DCSS-encoder module uses multiple groups of Mamba to decouple and model the local features of dual-channel sequences with non-overlapping patches. The LGI-Att branch uses a lightweight compressed and extended attention module to perceive the global features of the spatial and spectral domains of the unsegmented whole image. By fusing local and global features, high-precision classification of hyperspectral images is achieved. Extensive experiments demonstrate the superiority of the proposed HS-Mamba, outperforming state-of-the-art methods on four benchmark HSI datasets.

Paper Structure

This paper contains 47 sections, 11 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Innovation of strategy. The traditional pixel-patch based strategy and the whole-image based strategy have their own defects: the pixel-patch based strategy lacks the understanding of the global semantics and and suffers from pixel noise sensitivity, while the whole-image based strategy has the problem of losing high-frequency local details. The proposed full-field interaction strategy strikes an optimal balance between the two and achieve efficient full-field representation.
  • Figure 2: The overview of the proposed full-field interaction multi-groups Mamba (HS-Mamba) for HSI classification. (a) The overall architecture of the proposed HS-Mamba, three-stage architecture employs HS-Mamba blocks for hierarchical representation, achieving refined classification through two down-sampling layers and a up-sampling operation; (b) The computational procedure of the proposed dual-domain scanning for spatial and spectral features; (c) The HS-Mamba Block processes HSI input through parallel pathways: the LGI Attention module extracts global features while the DCSS-Encoder processes local position-enhanced non-overlapping patches, followed by gated fusion to yield the integrated representation.
  • Figure 3: The proposed multi-groups Mamba module employs parallel S6Mamba as sub-modules to independently process grouped input features, learn their long-range dependencies, and fuse outputs through element-wise multiplication with learnable weights, followed by channel-wise concatenation to reconstruct original dimensions.
  • Figure 4: An illustrative diagram demonstrating feature map evolution and adaptive patch size variation in the DCSS-Encoder, revealing its hierarchical multi-scale feature learning process.
  • Figure 5: Visualization of the classification results for Indian Pines dataset. (a) Ground-truth map. (b) SVM. (c) 3D-CNN. (d) FullyContNet. (e) SSFTT. (f) MorpyFormer. (g) GSC-ViT. (h) 3DSS-Mamba. (i) MambaHSI. (j) proposed HS-Mamba. The meaning of colors refers to Table \ref{['tab:dataset_all']}.
  • ...and 4 more figures