CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector
Tianheng Qiu, Ka Lung Law, Guanghua Pan, Jufei Wang, Xin Gao, Xuan Huang, Hu Wei
TL;DR
This paper addresses the challenge of unsupervised domain adaptation for single-stage object detectors, focusing on YOLO under domain shifts. It introduces CLDA-YOLO, a teacher–student framework augmented with uncertainty-aware pseudo-labeling, dynamic data augmentation, and a multi-stage visual contrastive learning strategy that aligns backbone and head features across domains. The approach achieves state-of-the-art or competitive results across multiple domain-shift benchmarks, notably outperforming prior DAOD methods on Cityscapes→Foggy Cityscapes, and demonstrates effective component-wise gains via ablations. The method provides a fast, scalable solution for cross-domain detection with practical implications for real-world deployments of YOLO in varied environments.
Abstract
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts, thereby reducing the necessity for extensive labeling and retraining. Current domain adaptive object detection algorithms primarily cater to two-stage detectors, which tend to offer minimal improvements when directly applied to single-stage detectors such as YOLO. Intending to benefit the YOLO detector from UDA, we build a comprehensive domain adaptive architecture using a teacher-student cooperative system for the YOLO detector. In this process, we propose uncertainty learning to cope with pseudo-labeling generated by the teacher model with extreme uncertainty and leverage dynamic data augmentation to asymptotically adapt the teacher-student system to the environment. To address the inability of single-stage object detectors to align at multiple stages, we utilize a unified visual contrastive learning paradigm that aligns instance at backbone and head respectively, which steadily improves the robustness of the detectors in cross-domain tasks. In summary, we present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO), which achieves highly competitive results across multiple domain adaptive datasets without any reduction in inference speed.
