Weakly-supervised Contrastive Learning with Quantity Prompts for Moving Infrared Small Target Detection
Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Jianghong Huang, Mao Ye
TL;DR
This work tackles moving infrared small target detection under limited annotations by introducing WeCoL, a weakly-supervised framework that uses target quantity prompts and SAM-based mining to generate pseudo-labels. It enhances pseudo-label quality through pseudo-label contrastive learning and exploits both short-term and long-term motion cues via a long-short term motion-aware module, all trained with a joint loss that balances detection and pseudo-label refinement. Experiments on DAUB and ITSDT-15K show WeCoL outperforms existing weakly-supervised methods and approaches state-of-the-art fully-supervised performance, highlighting its potential to reduce annotation costs while maintaining high detection accuracy. The approach leverages foundation-model prompts to generate initial targets and uses temporal information to stabilize detections in challenging infrared scenes, offering practical benefits for real-world ISTD tasks, albeit with higher model complexity and a dependency on pseudo-label quality.
Abstract
Different from general object detection, moving infrared small target detection faces huge challenges due to tiny target size and weak background contrast.Currently, most existing methods are fully-supervised, heavily relying on a large number of manual target-wise annotations. However, manually annotating video sequences is often expensive and time-consuming, especially for low-quality infrared frame images. Inspired by general object detection, non-fully supervised strategies ($e.g.$, weakly supervised) are believed to be potential in reducing annotation requirements. To break through traditional fully-supervised frameworks, as the first exploration work, this paper proposes a new weakly-supervised contrastive learning (WeCoL) scheme, only requires simple target quantity prompts during model training.Specifically, in our scheme, based on the pretrained segment anything model (SAM), a potential target mining strategy is designed to integrate target activation maps and multi-frame energy accumulation.Besides, contrastive learning is adopted to further improve the reliability of pseudo-labels, by calculating the similarity between positive and negative samples in feature subspace.Moreover, we propose a long-short term motion-aware learning scheme to simultaneously model the local motion patterns and global motion trajectory of small targets.The extensive experiments on two public datasets (DAUB and ITSDT-15K) verify that our weakly-supervised scheme could often outperform early fully-supervised methods. Even, its performance could reach over 90\% of state-of-the-art (SOTA) fully-supervised ones.
