HS-Mamba: Full-Field Interaction Multi-Groups Mamba for Hyperspectral Image Classification
Hongxing Peng, Kang Lin, Huanai Liu
TL;DR
Hyperspectral image classification faces a trade-off between capturing fine-grained local details and maintaining global context. The authors introduce HS-Mamba, a full-field interaction framework that processes non-overlapping patches with a dual-channel spatial-spectral encoder (DCSS-encoder) using multi-group Mamba, while a lightweight global inline attention (LGI-Att) branch leverages full-image context. Key contributions include the DCSS-Encoder with cosine positional encoding, the multi-groups Mamba fusion mechanism, the Spe-compressed and Spa-extended attention modules, and comprehensive ablations and efficiency analyses demonstrating SOTA performance on four benchmarks with reduced computational cost. This approach offers a scalable, accurate solution for HSI classification by effectively uniting local inline-feature modeling with global feature enhancement.
Abstract
Hyperspectral image (HSI) classification has been one of the hot topics in remote sensing fields. Recently, the Mamba architecture based on selective state-space models (S6) has demonstrated great advantages in long sequence modeling. However, the unique properties of hyperspectral data, such as high dimensionality and feature inlining, pose challenges to the application of Mamba to HSI classification. To compensate for these shortcomings, we propose an full-field interaction multi-groups Mamba framework (HS-Mamba), which adopts a strategy different from pixel-patch based or whole-image based, but combines the advantages of both. The patches cut from the whole image are sent to multi-groups Mamba, combined with positional information to perceive local inline features in the spatial and spectral domains, and the whole image is sent to a lightweight attention module to enhance the global feature representation ability. Specifically, HS-Mamba consists of a dual-channel spatial-spectral encoder (DCSS-encoder) module and a lightweight global inline attention (LGI-Att) branch. The DCSS-encoder module uses multiple groups of Mamba to decouple and model the local features of dual-channel sequences with non-overlapping patches. The LGI-Att branch uses a lightweight compressed and extended attention module to perceive the global features of the spatial and spectral domains of the unsegmented whole image. By fusing local and global features, high-precision classification of hyperspectral images is achieved. Extensive experiments demonstrate the superiority of the proposed HS-Mamba, outperforming state-of-the-art methods on four benchmark HSI datasets.
