Table of Contents
Fetching ...

CCDNet: Learning to Detect Camouflage against Distractors in Infrared Small Target Detection

Zikai Liao, Zhaozheng Yin

Abstract

Infrared target detection (IRSTD) tasks have critical applications in areas like wilderness rescue and maritime search. However, detecting infrared targets is challenging due to their low contrast and tendency to blend into complex backgrounds, effectively camouflaging themselves. Additionally, other objects with similar features (distractors) can cause false alarms, further degrading detection performance. To address these issues, we propose a novel \textbf{C}amouflage-aware \textbf{C}ounter-\textbf{D}istraction \textbf{Net}work (CCDNet) in this paper. We design a backbone with Weighted Multi-branch Perceptrons (WMPs), which aggregates self-conditioned multi-level features to accurately represent the target and background. Based on these rich features, we then propose a novel Aggregation-and-Refinement Fusion Neck (ARFN) to refine structures/semantics from shallow/deep features maps, and bidirectionally reconstruct the relations between the targets and the backgrounds, highlighting the targets while suppressing the complex backgrounds to improve detection accuracy. Furthermore, we present a new Contrastive-aided Distractor Discriminator (CaDD), enforcing adaptive similarity computation both locally and globally between the real targets and the backgrounds to more precisely discriminate distractors, so as to reduce the false alarm rate. Extensive experiments on infrared image datasets confirm that CCDNet outperforms other state-of-the-art methods.

CCDNet: Learning to Detect Camouflage against Distractors in Infrared Small Target Detection

Abstract

Infrared target detection (IRSTD) tasks have critical applications in areas like wilderness rescue and maritime search. However, detecting infrared targets is challenging due to their low contrast and tendency to blend into complex backgrounds, effectively camouflaging themselves. Additionally, other objects with similar features (distractors) can cause false alarms, further degrading detection performance. To address these issues, we propose a novel \textbf{C}amouflage-aware \textbf{C}ounter-\textbf{D}istraction \textbf{Net}work (CCDNet) in this paper. We design a backbone with Weighted Multi-branch Perceptrons (WMPs), which aggregates self-conditioned multi-level features to accurately represent the target and background. Based on these rich features, we then propose a novel Aggregation-and-Refinement Fusion Neck (ARFN) to refine structures/semantics from shallow/deep features maps, and bidirectionally reconstruct the relations between the targets and the backgrounds, highlighting the targets while suppressing the complex backgrounds to improve detection accuracy. Furthermore, we present a new Contrastive-aided Distractor Discriminator (CaDD), enforcing adaptive similarity computation both locally and globally between the real targets and the backgrounds to more precisely discriminate distractors, so as to reduce the false alarm rate. Extensive experiments on infrared image datasets confirm that CCDNet outperforms other state-of-the-art methods.

Paper Structure

This paper contains 33 sections, 14 equations, 21 figures, 10 tables.

Figures (21)

  • Figure 1: Illustrations of two key challenges in IRSTD tasks: target camouflage and distractor interference.
  • Figure 2: Overview of our proposed CCDNet. It has a backbone with proposed WMPs to extract rich contextual features, and ARFN with proposed TBSG and BOSE to highlight target features for camouflage detection via adaptive guidance from key semantics and structures. The proposed CaDD, with LCM and GCM, enables the network to better differentiate between real targets and their distractors.
  • Figure 3: Illustration of our proposed LCM and GCM in CaDD. $\mathbf{P}^{\text{in}}$ and $\mathbf{P}^{\text{out}}$ denote the nine-region area from the input feature map and the LCM-processed feature map. Pos. and Neg. means the positive and negative sample. Both LCM and GCM apply to all four output feature maps from the backbone. For visualization purposes, we only display its mechanism with only one feature map.
  • Figure 4: Qualitative results of our proposed method and other comparison methods. For better visualization, we only pick at least one method from each category to showcase their detection performance. The red, green, and yellow boxes are ground truths, detection results, and false alarms. Image without any red box indicates a missed detection. We put close-ups for each detection result for better visual comparison. Qualitative comparisons of the other methods can be found in the supplement. Our CCDNet yields superior results with fewer false alamrs, fewer missed detections, and better overlapped bounding boxes.
  • Figure 5: Heatmaps for the outputs of last WMP from the stage 1 backbone. The results verify our WMP can accurately perceive the targets.
  • ...and 16 more figures