IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang; Yuchun Wang; Jie Guo; Yunsong Li; Xinbo Gao; Jing Zhang

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang

TL;DR

This work tackles infrared small target detection by addressing the domain gap between infrared and natural images that limits direct application of the Segment Anything Model (SAM). It introduces IRSAM, which adds a Wavelet-based Perona-Malik Diffusion (WPMD) block in the encoder and a Granularity-Aware Decoder (GAD) to fuse multi-granularity features, built on a lightweight ViT-Tiny backbone (Mobile-SAM). The approach yields superior performance on public IRSTD datasets (NUAA-SIRST, IRSTD-1k, NUDT-SIRST) compared to state-of-the-art methods and SAM baselines, with ablations confirming the effectiveness of both WPMD and GAD. The work demonstrates that carefully designed diffusion-based edge preservation and multi-scale feature fusion can substantially improve infrared small-target segmentation, enabling more reliable real-world detection applications, and provides code for reproducibility.

Abstract

The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. Unlike a visible light camera, a thermal imager reveals an object's temperature distribution by capturing infrared radiation. Small targets often show a subtle temperature transition at the object's boundaries. To address this issue, we propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects. Specifically, we design a Perona-Malik diffusion (PMD)-based block and incorporate it into multiple levels of SAM's encoder to help it capture essential structural features while suppressing noise. Additionally, we devise a Granularity-Aware Decoder (GAD) to fuse the multi-granularity feature from the encoder to capture structural information that may be lost in long-distance modeling. Extensive experiments on the public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, validate the design choice of IRSAM and its significant superiority over representative state-of-the-art methods. The source code are available at: github.com/IPIC-Lab/IRSAM.

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

TL;DR

Abstract

Paper Structure (16 sections, 12 equations, 8 figures, 5 tables)

This paper contains 16 sections, 12 equations, 8 figures, 5 tables.

Introduction
Related Work
Infrared Small Target Detection
Segment Anything Model
Diffusion Equation for Image Processing
Methodology
Overall Architecture
Wavelet transform-based PMD Block
Granularity-Aware Decoder
Loss Functions
Experiments
Experiment Details
Quantitative Results
Visual Results
Ablation Study
...and 1 more sections

Figures (8)

Figure 1: Visual Comparison. Segmentation results of different methods on complex structured targets.
Figure 2: Overall Architecture of IRSAM. Utilizing an encoder-decoder structure rooted in SAM, IRSAM incorporates two novel modules: WPMD and GAD, crafted specifically for the IRSTD task.
Figure 3: Structure of the WPMD.
Figure 4: Visualization results using different IRSTD methods. The closed views are shown at the border. In each prediction result, red, blue, and yellow boxes represent the correct detection, miss detection, and false detection, respectively.
Figure 5: 3D views of the detection results obtained by different methods.
...and 3 more figures

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

TL;DR

Abstract

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)