Table of Contents
Fetching ...

Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights

Yan Hao, Florent Forest, Olga Fink

TL;DR

This paper addresses source-free domain adaptation for object detection, where access to source data is restricted. It demonstrates that adapting batch statistics via AdaBN and simpler self-training schemes can outperform many complex SFOD methods. It proposes Source-Free Unbiased Teacher (SF-UT) with an exponential moving average teacher and weak-strong augmentation, and a lightweight AdaBN + Fixed SF-FixMatch strategy; it also shows that training on a fixed set of pseudo-labels with AdaBN achieves competitive results and avoids teacher-student collapse. Experiments on Cityscapes→Foggy-Cityscapes, KITTI→Cityscapes, and SIM10k→Cityscapes show notable gains (e.g., 4.7 AP50 on Cityscapes→Foggy-Cityscapes) and competitive performance against state of the art. Overall, the work argues for simpler, BN-centric SFOD pipelines that are robust and efficient.

Abstract

This paper focuses on source-free domain adaptation for object detection in computer vision. This task is challenging and of great practical interest, due to the cost of obtaining annotated data sets for every new domain. Recent research has proposed various solutions for Source-Free Object Detection (SFOD), most being variations of teacher-student architectures with diverse feature alignment, regularization and pseudo-label selection strategies. Our work investigates simpler approaches and their performance compared to more complex SFOD methods in several adaptation scenarios. We highlight the importance of batch normalization layers in the detector backbone, and show that adapting only the batch statistics is a strong baseline for SFOD. We propose a simple extension of a Mean Teacher with strong-weak augmentation in the source-free setting, Source-Free Unbiased Teacher (SF-UT), and show that it actually outperforms most of the previous SFOD methods. Additionally, we showcase that an even simpler strategy consisting in training on a fixed set of pseudo-labels can achieve similar performance to the more complex teacher-student mutual learning, while being computationally efficient and mitigating the major issue of teacher-student collapse. We conduct experiments on several adaptation tasks using benchmark driving datasets including (Foggy)Cityscapes, Sim10k and KITTI, and achieve a notable improvement of 4.7\% AP50 on Cityscapes$\rightarrow$Foggy-Cityscapes compared with the latest state-of-the-art in SFOD. Source code is available at https://github.com/EPFL-IMOS/simple-SFOD.

Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights

TL;DR

This paper addresses source-free domain adaptation for object detection, where access to source data is restricted. It demonstrates that adapting batch statistics via AdaBN and simpler self-training schemes can outperform many complex SFOD methods. It proposes Source-Free Unbiased Teacher (SF-UT) with an exponential moving average teacher and weak-strong augmentation, and a lightweight AdaBN + Fixed SF-FixMatch strategy; it also shows that training on a fixed set of pseudo-labels with AdaBN achieves competitive results and avoids teacher-student collapse. Experiments on Cityscapes→Foggy-Cityscapes, KITTI→Cityscapes, and SIM10k→Cityscapes show notable gains (e.g., 4.7 AP50 on Cityscapes→Foggy-Cityscapes) and competitive performance against state of the art. Overall, the work argues for simpler, BN-centric SFOD pipelines that are robust and efficient.

Abstract

This paper focuses on source-free domain adaptation for object detection in computer vision. This task is challenging and of great practical interest, due to the cost of obtaining annotated data sets for every new domain. Recent research has proposed various solutions for Source-Free Object Detection (SFOD), most being variations of teacher-student architectures with diverse feature alignment, regularization and pseudo-label selection strategies. Our work investigates simpler approaches and their performance compared to more complex SFOD methods in several adaptation scenarios. We highlight the importance of batch normalization layers in the detector backbone, and show that adapting only the batch statistics is a strong baseline for SFOD. We propose a simple extension of a Mean Teacher with strong-weak augmentation in the source-free setting, Source-Free Unbiased Teacher (SF-UT), and show that it actually outperforms most of the previous SFOD methods. Additionally, we showcase that an even simpler strategy consisting in training on a fixed set of pseudo-labels can achieve similar performance to the more complex teacher-student mutual learning, while being computationally efficient and mitigating the major issue of teacher-student collapse. We conduct experiments on several adaptation tasks using benchmark driving datasets including (Foggy)Cityscapes, Sim10k and KITTI, and achieve a notable improvement of 4.7\% AP50 on CityscapesFoggy-Cityscapes compared with the latest state-of-the-art in SFOD. Source code is available at https://github.com/EPFL-IMOS/simple-SFOD.
Paper Structure (21 sections, 4 equations, 5 figures, 8 tables)

This paper contains 21 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of Source-Free Mean Teacher configurations for SFOD with different teacher update rates $\alpha$ and use of weak-strong augmentation. The extreme case of $\alpha=0$ (i.e., teacher = student) corresponds to (source-free) Pseudo-Label lee_pseudo-label_2013 and FixMatch sohn2020fixmatch respectively. On the other hand, $\alpha=1$ boils down to freezing the teacher and training on a fixed set of pseudo-labels. Surprisingly, training on fixed pseudo-labels after AdaBN li2018adaptive yields similar performance than more complex teacher-student mutual learning and challenges state-of-the-art SFOD methods.
  • Figure 2: Proposed Source-Free Unbiased Teacher (SF-UT) architecture.
  • Figure 3: Proposed Fixed Source-Free FixMatch (Fixed SF-FM) strategy.
  • Figure 4: Training curves of the studied self-training strategies on Cityscapes$\rightarrow$Foggy. The best-performing ones are the proposed Source-Free Unbiased Teacher (SF-UT) and AdaBN + Fixed Source-Free FixMatch (SF-FM), the latter being the only method not subject to collapse during training.
  • Figure 5: Illustration of the visualization results from different models. From left to right, they are source model, AdaBN model, SF-UT model and AdaBN + Fixed SF-FM. The adaptation task is from Cityscapes to Foggy-Cityscapes. The images shown are on Foggy-Cityscapes.