Table of Contents
Fetching ...

Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection

Mengxuan Xiao, Yinfei Zhu, Yiming Zhu, Boyang Li, Feifei Zhang, Huan Wang, Meng Cai, Yimian Dai

TL;DR

This work tackles the challenge of detecting densely clustered infrared small targets by leveraging background semantics. It introduces DenseSIRST, a dataset with per-pixel background annotations, and BAFE-Net, a multi-task architecture that jointly performs target detection and background semantic segmentation via a dynamic cross-task feature exchange mechanism. The method employs BAG-CP to synthesize realistic densely clustered scenes and demonstrates improved detection accuracy with reduced false alarms across DenseSIRST and existing IRSTD benchmarks. Collectively, the approach highlights the importance of contextual information and explicit background modeling for robust infrared small target detection in complex environments.

Abstract

Infrared small target detection presents significant challenges due to the limited intrinsic features of the target and the overwhelming presence of visually similar background distractors. We contend that background semantics are critical for distinguishing between objects that appear visually similar in this context. To address this challenge, we propose a task, clustered infrared small target detection, and introduce DenseSIRST, a benchmark dataset that provides per-pixel semantic annotations for background regions. This dataset facilitates the shift from sparse to dense target detection. This dataset facilitates the shift from sparse to dense target detection. Building on this resource, we propose the Background-Aware Feature Exchange Network (BAFE-Net), a multi-task architecture that jointly tackles target detection and background semantic segmentation. BAFE-Net incorporates a dynamic cross-task feature hard-exchange mechanism, enabling the effective exchange of target and background semantics between the two tasks. Comprehensive experiments demonstrate that BAFE-Net significantly enhances target detection accuracy while mitigating false alarms. The DenseSIRST dataset, along with the code and trained models, is publicly available at https://github.com/GrokCV/BAFE-Net.

Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection

TL;DR

This work tackles the challenge of detecting densely clustered infrared small targets by leveraging background semantics. It introduces DenseSIRST, a dataset with per-pixel background annotations, and BAFE-Net, a multi-task architecture that jointly performs target detection and background semantic segmentation via a dynamic cross-task feature exchange mechanism. The method employs BAG-CP to synthesize realistic densely clustered scenes and demonstrates improved detection accuracy with reduced false alarms across DenseSIRST and existing IRSTD benchmarks. Collectively, the approach highlights the importance of contextual information and explicit background modeling for robust infrared small target detection in complex environments.

Abstract

Infrared small target detection presents significant challenges due to the limited intrinsic features of the target and the overwhelming presence of visually similar background distractors. We contend that background semantics are critical for distinguishing between objects that appear visually similar in this context. To address this challenge, we propose a task, clustered infrared small target detection, and introduce DenseSIRST, a benchmark dataset that provides per-pixel semantic annotations for background regions. This dataset facilitates the shift from sparse to dense target detection. This dataset facilitates the shift from sparse to dense target detection. Building on this resource, we propose the Background-Aware Feature Exchange Network (BAFE-Net), a multi-task architecture that jointly tackles target detection and background semantic segmentation. BAFE-Net incorporates a dynamic cross-task feature hard-exchange mechanism, enabling the effective exchange of target and background semantics between the two tasks. Comprehensive experiments demonstrate that BAFE-Net significantly enhances target detection accuracy while mitigating false alarms. The DenseSIRST dataset, along with the code and trained models, is publicly available at https://github.com/GrokCV/BAFE-Net.
Paper Structure (30 sections, 5 equations, 5 figures, 10 tables)

This paper contains 30 sections, 5 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Illustration of the background semantics' crucial role in infrared small target detection. When considering only local regions (as in the top-left image), it is challenging to differentiate between clustered small targets and false alarms due to their similar appearances. By incorporating global contextual information (as in the surrounding images), the distinction between real targets and false alarms becomes more apparent. From the global images, it can be seen that the sky background is highlighted in red. The small target in the blue box represents a genuine target of interest against the sky background, while the small target in the red box is a false alarm in the context of the building background.
  • Figure 2: Synthetic generation pipeline and representative samples of the DenseSIRST dataset. (a) The BAG-CP pipeline for synthesizing the DenseSIRST dataset incorporates three key components: semantic-aware background selection, Gaussian-based target fusion, and realistic sample generation. (b) Illustration of selected images from our proposed DenseSIRST dataset. Each image pair consists of a simulated dense small target image (left) and its corresponding sky-segmented version (right).
  • Figure 3: Statistical characteristics of our DenseSIRST dataset, underscoring the detection challenges it presents. (a) The distribution of target sizes, with the circle sizes in the scatter plot indicating the prevalence of each size category. (b) The distribution of local contrasts for small targets in the dataset, demonstrating that the dataset encompasses a broad range of contrasts, down to a minimum of 0.15. (c) The brightness distribution of small targets in the dataset, highlighting the wide spectrum from dark to light targets this dataset offers for detection tasks.
  • Figure 4: The framework of BAFE-Net. BAFE-Net contains three modules: ResNet, FPN and BAFE-Head. The Head module integrates a background segmentation branch to work in parallel with the classification branch. Both branches utilize the DCS Module to select the most discriminative top-k channel features, followed by channel-wise feature interaction. This enhancement transforms the original single-task object detection framework into a multi-task learning architecture, enabling simultaneous and efficient execution of both object detection and background semantic segmentation.
  • Figure 5: Feature visualization comparison. This comparison showcases the detection results of different methods in terms of feature visualization. The proposed BAFE-Net demonstrates superior performance by effectively detecting small objects with fewer false alarms.