Table of Contents
Fetching ...

Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection

Mohamed Lamine Mekhalfi, Davide Boscaini, Fabio Poiesi

TL;DR

This work tackles source-free domain-adaptive object detection by introducing SF-DACA, a data-augmentation-driven framework that selects confident regions from target images, constructs composite challenging samples, and self-trains the detector while preserving source knowledge through a teacher-student setup. Central to the method is a four-step pipeline (detect, augment, compose, adapt) that leverages regional pseudo-labels and a consistency objective, with an exponential moving-average teacher guiding the adaptation to prevent collapse. Empirical results across Cityscapes, FoggyCityscapes, Sim10K, and KITTI demonstrate state-of-the-art performance on two of three traffic-domain benchmarks, with detailed ablations confirming the importance of region selection thresholds and the DACA augmentation. The approach provides a practical and scalable solution for SF-UDA in object detection, with potential extensions to zero-shot grounding to further curb false positives and domain misalignment.

Abstract

Source-free domain-adaptive object detection is an interesting but scarcely addressed topic. It aims at adapting a source-pretrained detector to a distinct target domain without resorting to source data during adaptation. So far, there is no data augmentation scheme tailored to source-free domain-adaptive object detection. To this end, this paper presents a novel data augmentation approach that cuts out target image regions where the detector is confident, augments them along with their respective pseudo-labels, and joins them into a challenging target image to adapt the detector. As the source data is out of reach during adaptation, we implement our approach within a teacher-student learning paradigm to ensure that the model does not collapse during the adaptation procedure. We evaluated our approach on three adaptation benchmarks of traffic scenes, scoring new state-of-the-art on two of them.

Leveraging Confident Image Regions for Source-Free Domain-Adaptive Object Detection

TL;DR

This work tackles source-free domain-adaptive object detection by introducing SF-DACA, a data-augmentation-driven framework that selects confident regions from target images, constructs composite challenging samples, and self-trains the detector while preserving source knowledge through a teacher-student setup. Central to the method is a four-step pipeline (detect, augment, compose, adapt) that leverages regional pseudo-labels and a consistency objective, with an exponential moving-average teacher guiding the adaptation to prevent collapse. Empirical results across Cityscapes, FoggyCityscapes, Sim10K, and KITTI demonstrate state-of-the-art performance on two of three traffic-domain benchmarks, with detailed ablations confirming the importance of region selection thresholds and the DACA augmentation. The approach provides a practical and scalable solution for SF-UDA in object detection, with potential extensions to zero-shot grounding to further curb false positives and domain misalignment.

Abstract

Source-free domain-adaptive object detection is an interesting but scarcely addressed topic. It aims at adapting a source-pretrained detector to a distinct target domain without resorting to source data during adaptation. So far, there is no data augmentation scheme tailored to source-free domain-adaptive object detection. To this end, this paper presents a novel data augmentation approach that cuts out target image regions where the detector is confident, augments them along with their respective pseudo-labels, and joins them into a challenging target image to adapt the detector. As the source data is out of reach during adaptation, we implement our approach within a teacher-student learning paradigm to ensure that the model does not collapse during the adaptation procedure. We evaluated our approach on three adaptation benchmarks of traffic scenes, scoring new state-of-the-art on two of them.
Paper Structure (15 sections, 6 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 15 sections, 6 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Proposed SF-DACA. The student and the teacher models posses the same source-pretrained architecture (bottom-left). The green flow indicates adaptation towards the target via self-training with augmented and composed confident target regions. The blue flow refers to the teacher supervision to refrain the student from collapsing. Yellow flow ensures that the teacher model distills target knowledge from the student progressively along the adaptation progress.
  • Figure 2: Qualitative examples from the C$\to$F (top 2 rows), K$\to$C (middle 2 rows) and S$\to$C (bottom 2 rows) adaptation scenarios. Detections of source-pretrained model (left) versus predictions of target-adapted model via SF-DACA (right). Green bounding boxes refer to the groundtruth, and red ones refer to the detections. Detected object class and confidence are provided on the top left of each instance. Best viewed in color.
  • Figure 3: Effect of catastrophic forgetting phenomenon when the teacher model is discarded. Best viewed in color.