Table of Contents
Fetching ...

Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target Detection

Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Jianghong Huang, Mao Ye

TL;DR

This work tackles moving infrared small target detection under limited annotations by introducing WeCoL, a weakly-supervised framework that uses target quantity prompts and SAM-based mining to generate pseudo-labels. It enhances pseudo-label quality through pseudo-label contrastive learning and exploits both short-term and long-term motion cues via a long-short term motion-aware module, all trained with a joint loss that balances detection and pseudo-label refinement. Experiments on DAUB and ITSDT-15K show WeCoL outperforms existing weakly-supervised methods and approaches state-of-the-art fully-supervised performance, highlighting its potential to reduce annotation costs while maintaining high detection accuracy. The approach leverages foundation-model prompts to generate initial targets and uses temporal information to stabilize detections in challenging infrared scenes, offering practical benefits for real-world ISTD tasks, albeit with higher model complexity and a dependency on pseudo-label quality.

Abstract

Different from general object detection, moving infrared small target detection faces huge challenges due to tiny target size and weak background contrast.Currently, most existing methods are fully-supervised, heavily relying on a large number of manual target-wise annotations. However, manually annotating video sequences is often expensive and time-consuming, especially for low-quality infrared frame images. Inspired by general object detection, non-fully supervised strategies ($e.g.$, weakly supervised) are believed to be potential in reducing annotation requirements. To break through traditional fully-supervised frameworks, as the first exploration work, this paper proposes a new weakly-supervised contrastive learning (WeCoL) scheme, only requires simple target quantity prompts during model training.Specifically, in our scheme, based on the pretrained segment anything model (SAM), a potential target mining strategy is designed to integrate target activation maps and multi-frame energy accumulation.Besides, contrastive learning is adopted to further improve the reliability of pseudo-labels, by calculating the similarity between positive and negative samples in feature subspace.Moreover, we propose a long-short term motion-aware learning scheme to simultaneously model the local motion patterns and global motion trajectory of small targets.The extensive experiments on two public datasets (DAUB and ITSDT-15K) verify that our weakly-supervised scheme could often outperform early fully-supervised methods. Even, its performance could reach over 90\% of state-of-the-art (SOTA) fully-supervised ones.

Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target Detection

TL;DR

This work tackles moving infrared small target detection under limited annotations by introducing WeCoL, a weakly-supervised framework that uses target quantity prompts and SAM-based mining to generate pseudo-labels. It enhances pseudo-label quality through pseudo-label contrastive learning and exploits both short-term and long-term motion cues via a long-short term motion-aware module, all trained with a joint loss that balances detection and pseudo-label refinement. Experiments on DAUB and ITSDT-15K show WeCoL outperforms existing weakly-supervised methods and approaches state-of-the-art fully-supervised performance, highlighting its potential to reduce annotation costs while maintaining high detection accuracy. The approach leverages foundation-model prompts to generate initial targets and uses temporal information to stabilize detections in challenging infrared scenes, offering practical benefits for real-world ISTD tasks, albeit with higher model complexity and a dependency on pseudo-label quality.

Abstract

Different from general object detection, moving infrared small target detection faces huge challenges due to tiny target size and weak background contrast.Currently, most existing methods are fully-supervised, heavily relying on a large number of manual target-wise annotations. However, manually annotating video sequences is often expensive and time-consuming, especially for low-quality infrared frame images. Inspired by general object detection, non-fully supervised strategies (, weakly supervised) are believed to be potential in reducing annotation requirements. To break through traditional fully-supervised frameworks, as the first exploration work, this paper proposes a new weakly-supervised contrastive learning (WeCoL) scheme, only requires simple target quantity prompts during model training.Specifically, in our scheme, based on the pretrained segment anything model (SAM), a potential target mining strategy is designed to integrate target activation maps and multi-frame energy accumulation.Besides, contrastive learning is adopted to further improve the reliability of pseudo-labels, by calculating the similarity between positive and negative samples in feature subspace.Moreover, we propose a long-short term motion-aware learning scheme to simultaneously model the local motion patterns and global motion trajectory of small targets.The extensive experiments on two public datasets (DAUB and ITSDT-15K) verify that our weakly-supervised scheme could often outperform early fully-supervised methods. Even, its performance could reach over 90\% of state-of-the-art (SOTA) fully-supervised ones.

Paper Structure

This paper contains 28 sections, 15 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The comparisons between typical fully-supervised scheme and our weakly-supervised scheme. The green arrows in our scheme denote real inference pipleline.
  • Figure 2: Our WeCoL framework, with training and inference pipelines. In training, the final pseudo-labels $\boldsymbol{\mathcal{G}_n}$, used to supervise multi-frame detector training, are generated and refined by the collaboration of both "Potential Target Mining" and "Pseudo-label Contrastive Learning". In inference, $T$ frames are utilized to detect moving infrared small targets.
  • Figure 3: The PR curves comparisons for different methods on DAUB.
  • Figure 4: The PR curves comparisons for different methods on ITSDT-15K.
  • Figure 5: The visual results of different methods on DAUB. Blue boxes and yellow circles represent amplified target regions and false alarms, respectively.
  • ...and 5 more figures