Table of Contents
Fetching ...

IrisNet: Infrared Image Status Awareness Meta Decoder for Infrared Small Targets Detection

Xuelin Qian, Jiaming Lu, Zixuan Wang, Wenxuan Wang, Zhongling Huang, Dingwen Zhang, Junwei Han

TL;DR

IRSTD suffers from pattern drift and low SNR across diverse environments. The authors introduce IrisNet, a meta-learned framework that dynamically generates the entire decoder conditioned on infrared image status through an image-to-decoder transformer and a structured 2D decoder representation. Key contributions include (1) dynamic image-to-decoder mapping, (2) a structured decoder that preserves inter-layer parameter relationships, and (3) high-frequency augmentation in the encoder to improve edge and target cues. Empirical results on NUAA-SIRST, NUDT-SIRST, and IRSTD-1K demonstrate state-of-the-art performance and improved robustness for infrared small-target detection.

Abstract

Infrared Small Target Detection (IRSTD) faces significant challenges due to low signal-to-noise ratios, complex backgrounds, and the absence of discernible target features. While deep learning-based encoder-decoder frameworks have advanced the field, their static pattern learning suffers from pattern drift across diverse scenarios (\emph{e.g.}, day/night variations, sky/maritime/ground domains), limiting robustness. To address this, we propose IrisNet, a novel meta-learned framework that dynamically adapts detection strategies to the input infrared image status. Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters via an image-to-decoder transformer. More concretely, we represent the parameterized decoder as a structured 2D tensor preserving hierarchical layer correlations and enable the transformer to model inter-layer dependencies through self-attention while generating adaptive decoding patterns via cross-attention. To further enhance the perception ability of infrared images, we integrate high-frequency components to supplement target-position and scene-edge information. Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet, achieving state-of-the-art performance.

IrisNet: Infrared Image Status Awareness Meta Decoder for Infrared Small Targets Detection

TL;DR

IRSTD suffers from pattern drift and low SNR across diverse environments. The authors introduce IrisNet, a meta-learned framework that dynamically generates the entire decoder conditioned on infrared image status through an image-to-decoder transformer and a structured 2D decoder representation. Key contributions include (1) dynamic image-to-decoder mapping, (2) a structured decoder that preserves inter-layer parameter relationships, and (3) high-frequency augmentation in the encoder to improve edge and target cues. Empirical results on NUAA-SIRST, NUDT-SIRST, and IRSTD-1K demonstrate state-of-the-art performance and improved robustness for infrared small-target detection.

Abstract

Infrared Small Target Detection (IRSTD) faces significant challenges due to low signal-to-noise ratios, complex backgrounds, and the absence of discernible target features. While deep learning-based encoder-decoder frameworks have advanced the field, their static pattern learning suffers from pattern drift across diverse scenarios (\emph{e.g.}, day/night variations, sky/maritime/ground domains), limiting robustness. To address this, we propose IrisNet, a novel meta-learned framework that dynamically adapts detection strategies to the input infrared image status. Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters via an image-to-decoder transformer. More concretely, we represent the parameterized decoder as a structured 2D tensor preserving hierarchical layer correlations and enable the transformer to model inter-layer dependencies through self-attention while generating adaptive decoding patterns via cross-attention. To further enhance the perception ability of infrared images, we integrate high-frequency components to supplement target-position and scene-edge information. Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet, achieving state-of-the-art performance.

Paper Structure

This paper contains 17 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Pattern drift analysis. (a) Infrared targets can exhibit different characteristics across different scenarios. The red arrows represent the target, and the yellow arrows represent false alarms. (b) Our pilot study demonstrates that adopting a static decoding paradigm for different scenarios may result in suboptimal IRSTD performance due to the issue of pattern shift. (c) Visualizations of feature distribution to show the discrepancy between infrared images with different targets and scenarios. Thereby, it requires distinct decoding patterns to localize them.
  • Figure 2: Overview of the proposed IrisNet architecture. (1) Image Encoder extracts high-frequency–enhanced hierarchical features; (2) Image-to-Decoder transformer maps features to decoder parameters through learnable tokens; (3) Meta Decoder constructs the decoder to output binary localization masks.
  • Figure 3: Details of Multi-kernel Aggregation Block (MKAB).
  • Figure 4: Details of three types of Meta Decoder. All parameters are dynamically produced by the Image-to-decoder transformer. (a) uses stacked $3\times3$ conv + BatchNorm + ReLU layers for efficient feature extraction. (b) Additionally, it adds parallel $3\times3$ and $5\times5$ depthwise separable convolutions to capture multi-scale features. (c) Further augments the multi-scale design with a spatial attention branch to reinforce targets and suppress background.
  • Figure 5: Visual results of different IRSTD methods. The boxes in green, yellow, and red represent correct, missed, and false detections, respectively. The close-up views are shown in the corners with dashed lines.