Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
Yan Hao, Florent Forest, Olga Fink
TL;DR
This paper addresses source-free domain adaptation for object detection, where access to source data is restricted. It demonstrates that adapting batch statistics via AdaBN and simpler self-training schemes can outperform many complex SFOD methods. It proposes Source-Free Unbiased Teacher (SF-UT) with an exponential moving average teacher and weak-strong augmentation, and a lightweight AdaBN + Fixed SF-FixMatch strategy; it also shows that training on a fixed set of pseudo-labels with AdaBN achieves competitive results and avoids teacher-student collapse. Experiments on Cityscapes→Foggy-Cityscapes, KITTI→Cityscapes, and SIM10k→Cityscapes show notable gains (e.g., 4.7 AP50 on Cityscapes→Foggy-Cityscapes) and competitive performance against state of the art. Overall, the work argues for simpler, BN-centric SFOD pipelines that are robust and efficient.
Abstract
This paper focuses on source-free domain adaptation for object detection in computer vision. This task is challenging and of great practical interest, due to the cost of obtaining annotated data sets for every new domain. Recent research has proposed various solutions for Source-Free Object Detection (SFOD), most being variations of teacher-student architectures with diverse feature alignment, regularization and pseudo-label selection strategies. Our work investigates simpler approaches and their performance compared to more complex SFOD methods in several adaptation scenarios. We highlight the importance of batch normalization layers in the detector backbone, and show that adapting only the batch statistics is a strong baseline for SFOD. We propose a simple extension of a Mean Teacher with strong-weak augmentation in the source-free setting, Source-Free Unbiased Teacher (SF-UT), and show that it actually outperforms most of the previous SFOD methods. Additionally, we showcase that an even simpler strategy consisting in training on a fixed set of pseudo-labels can achieve similar performance to the more complex teacher-student mutual learning, while being computationally efficient and mitigating the major issue of teacher-student collapse. We conduct experiments on several adaptation tasks using benchmark driving datasets including (Foggy)Cityscapes, Sim10k and KITTI, and achieve a notable improvement of 4.7\% AP50 on Cityscapes$\rightarrow$Foggy-Cityscapes compared with the latest state-of-the-art in SFOD. Source code is available at https://github.com/EPFL-IMOS/simple-SFOD.
