SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

Wenhao Xu; Shuchen Zheng; Changwei Wang; Zherui Zhang; Chuan Ren; Rongtao Xu; Shibiao Xu

SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

Wenhao Xu, Shuchen Zheng, Changwei Wang, Zherui Zhang, Chuan Ren, Rongtao Xu, Shibiao Xu

TL;DR

This work tackles infrared small target detection (ISTD), where targets occupy minuscule image areas and blend into cluttered backgrounds. It introduces SAMamba, which combines SAM2's hierarchical features with Vision Mamba-inspired selective sequence modeling, augmented by the FS-Adapter for domain-aware feature selection, the CSI module for efficient long-range context, and the DPCF fusion strategy to preserve fine details during multi-scale fusion. Empirical results on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST show SAMamba achieving state-of-the-art IoU, nIoU, and F1 scores, including strong performance on synthetic, highly challenging scenes. The approach offers a robust, computation-aware solution for ISTD with practical implications for long-range surveillance and autonomous systems, while highlighting avenues for temporal, hardware-optimized, and multi-modal extensions.

Abstract

Infrared small target detection (ISTD) is vital for long-range surveillance in military, maritime, and early warning applications. ISTD is challenged by targets occupying less than 0.15% of the image and low distinguishability from complex backgrounds. Existing deep learning methods often suffer from information loss during downsampling and inefficient global context modeling. This paper presents SAMamba, a novel framework integrating SAM2's hierarchical feature learning with Mamba's selective sequence modeling. Key innovations include: (1) A Feature Selection Adapter (FS-Adapter) for efficient natural-to-infrared domain adaptation via dual-stage selection (token-level with a learnable task embedding and channel-wise adaptive transformations); (2) A Cross-Channel State-Space Interaction (CSI) module for efficient global context modeling with linear complexity using selective state space modeling; and (3) A Detail-Preserving Contextual Fusion (DPCF) module that adaptively combines multi-scale features with a gating mechanism to balance high-resolution and low-resolution feature contributions. SAMamba addresses core ISTD challenges by bridging the domain gap, maintaining fine-grained details, and efficiently modeling long-range dependencies. Experiments on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST datasets show SAMamba significantly outperforms state-of-the-art methods, especially in challenging scenarios with heterogeneous backgrounds and varying target scales. Code: https://github.com/zhengshuchen/SAMamba.

SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

TL;DR

Abstract

SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)