Table of Contents
Fetching ...

DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment

Jianhong Han, Liang Chen, Yupei Wang

TL;DR

The paper tackles cross-domain unsupervised object detection by introducing DATR, a DETR-based detector that combines class-aware feature alignment and dataset-level representation learning with a mean-teacher self-training framework. It presents two core modules: Class-wise Prototypes Alignment (CPA) for fine-grained, category-aware alignment, and Dataset-level Alignment Scheme (DAS) for global, dataset-wide representation via memory-prototyping and contrastive learning. The approach is integrated with a mean-teacher framework to further reduce domain bias using pseudo-labels from the target domain, with a two-stage training process (Burn-In and Teacher-Student Mutual Learning). Experiments across Weather, Synthetic-to-Real, and Scene adaptation scenarios demonstrate state-of-the-art improvements in mAP and strong generalization, with code released for reproducibility.

Abstract

Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instance-level features in a class-agnostic manner, overlooking the differences between extracted features from different categories, which results in only limited improvement. Furthermore, the scope of current alignment modules is often restricted to a limited batch of images, failing to learn the entire dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. Firstly, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between object detection task and domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.

DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment

TL;DR

The paper tackles cross-domain unsupervised object detection by introducing DATR, a DETR-based detector that combines class-aware feature alignment and dataset-level representation learning with a mean-teacher self-training framework. It presents two core modules: Class-wise Prototypes Alignment (CPA) for fine-grained, category-aware alignment, and Dataset-level Alignment Scheme (DAS) for global, dataset-wide representation via memory-prototyping and contrastive learning. The approach is integrated with a mean-teacher framework to further reduce domain bias using pseudo-labels from the target domain, with a two-stage training process (Burn-In and Teacher-Student Mutual Learning). Experiments across Weather, Synthetic-to-Real, and Scene adaptation scenarios demonstrate state-of-the-art improvements in mAP and strong generalization, with code released for reproducibility.

Abstract

Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instance-level features in a class-agnostic manner, overlooking the differences between extracted features from different categories, which results in only limited improvement. Furthermore, the scope of current alignment modules is often restricted to a limited batch of images, failing to learn the entire dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. Firstly, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between object detection task and domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.
Paper Structure (17 sections, 8 equations, 6 figures, 6 tables)

This paper contains 17 sections, 8 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The DATR employs a self-training framework that includes two models: a student model, serving as the core task model, and its temporally ensembled counterpart, known as the teacher model. The student model effectively aligns cross-domain features in a class-aware manner by utilizing the proposed Class-wise Prototypes Alignment (CPA) module. Subsequently, the designed Dataset-level Alignment Scheme (DAS) assists the detector in enhancing cross-domain feature alignment across the entire dataset through the use of contrastive learning. The teacher model, updated by the EMA of the student model, generates pseudo labels for images in the target domain. DATR utilizes these pseudo-labels to further mitigate the domain bias within the detector. We divide the training process into two stages. In the Burn-In stage, we exclusively train the student model, incorporating both supervised and unsupervised learning. In the Teacher-Student Mutual Learning stage, unlabeled data from the target domain are fed into the teacher model to generate pseudo labels for supervised learning.
  • Figure 2: Details of (a) the proposed detection pipeline, which incorporates the Class-wise Prototypes Alignment (CPA) module for achieving cross-domain feature alignment, and (b) the efficient batch computation method for extracting class-wise prototypes through the use of class masks.
  • Figure 3: Our proposed Dataset-level Alignment Scheme (DAS). Dataset-level prototypes can be generated using a memory module. Contrastive learning is applied across two domains to enforce refined feature adaptation.
  • Figure 4: The t-SNE visualization of object features from images originating from different domains. Our method aligns the domain shift well compared to the baseline method.
  • Figure 5: The t-SNE visualization of object features that belong to eight object classes within the Foggy Cityscapes images. Our method enhances both the global representation and inter-class discriminability in the resultant feature space
  • ...and 1 more figures