Table of Contents
Fetching ...

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition

Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub

TL;DR

This work tackles online source-free domain adaptation for object detection in autonomous driving by introducing unsupervised data acquisition to select informative frames and mitigate class imbalance. It builds a Mean Teacher–based Faster-RCNN framework and formalizes frame selection with two-stage AUF/ARC, guided by pseudo-labels and a joint loss $L = L_{FRCNN} + L_{S-T}$, with $L_{S-T} = \text{KL}(E_S^{CLS}, E_T^{CLS})$ and EMA updates for the teacher. Across four datasets and three deployment scenarios, the approach yields state-of-the-art improvements (e.g., Cityscapes→Foggy Cityscapes, Sim10k→Cityscapes, and SHIFT→Cityscapes), while significantly reducing adaptation time to ~12.7 ms/frame. The combination of informative-frame selection and rare-category emphasis demonstrates practical viability for real-world online adaptation in autonomous driving.

Abstract

Effective object detection in autonomous vehicles is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers model adaptation using a stream of unlabeled data from a target domain in an online manner. However, not all captured frames contain information beneficial for adaptation, especially in the presence of redundant data and class imbalance issues. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection through unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled frames for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving the adaptive object detector.

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition

TL;DR

This work tackles online source-free domain adaptation for object detection in autonomous driving by introducing unsupervised data acquisition to select informative frames and mitigate class imbalance. It builds a Mean Teacher–based Faster-RCNN framework and formalizes frame selection with two-stage AUF/ARC, guided by pseudo-labels and a joint loss , with and EMA updates for the teacher. Across four datasets and three deployment scenarios, the approach yields state-of-the-art improvements (e.g., Cityscapes→Foggy Cityscapes, Sim10k→Cityscapes, and SHIFT→Cityscapes), while significantly reducing adaptation time to ~12.7 ms/frame. The combination of informative-frame selection and rare-category emphasis demonstrates practical viability for real-world online adaptation in autonomous driving.

Abstract

Effective object detection in autonomous vehicles is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers model adaptation using a stream of unlabeled data from a target domain in an online manner. However, not all captured frames contain information beneficial for adaptation, especially in the presence of redundant data and class imbalance issues. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection through unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled frames for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving the adaptive object detector.
Paper Structure (19 sections, 7 equations, 2 figures, 4 tables)

This paper contains 19 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: For streaming frames, $f_t$, the data acquisition system continuously evaluates whether to retain the current frame using a similarity measure. Once $f_t$ is detected as "key frame", $f_t$ will be used to update the model. The condition $x$ refers to the if statement used to determine whether the current frame contains the rare category after the warm-up stage.
  • Figure 2: The four qualitative results demonstrate the performance of each baseline and our method on the Cityscapes validation set. The colours of the bounding boxes indicate different objects: red for Person , green for Car, blue for Mcycle, and yellow for Bike.