Table of Contents
Fetching ...

Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher

Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger

TL;DR

This work tackles MSDA for object detection by introducing Prototype-based Mean Teacher (PMT), a memory-efficient method that encodes domain-specific information with class prototypes rather than per-source subnets. It combines a mean-teacher framework for semi-supervised adaptation with a multi-domain discriminator for domain-invariant features and a prototype network using a contrastive loss to align same-class prototypes across domains while separating different classes. The approach yields strong improvements over state-of-the-art MSDA methods across cross-time, cross-camera, and mixed-domain benchmarks, while showing negligible parameter growth as the number of source domains increases. These results suggest PMT provides scalable, robust MSDA for OD with practical impact on real-world deployment where multiple, diverse source domains are available.

Abstract

Adapting visual object detectors to operational target domains is a challenging task, commonly achieved using unsupervised domain adaptation (UDA) methods. Recent studies have shown that when the labeled dataset comes from multiple source domains, treating them as separate domains and performing a multi-source domain adaptation (MSDA) improves the accuracy and robustness over blending these source domains and performing a UDA. For adaptation, existing MSDA methods learn domain-invariant and domain-specific parameters (for each source domain). However, unlike single-source UDA methods, learning domain-specific parameters makes them grow significantly in proportion to the number of source domains. This paper proposes a novel MSDA method called Prototype-based Mean Teacher (PMT), which uses class prototypes instead of domain-specific subnets to encode domain-specific information. These prototypes are learned using a contrastive loss, aligning the same categories across domains and separating different categories far apart. Given the use of prototypes, the number of parameters required for our PMT method does not increase significantly with the number of source domains, thus reducing memory issues and possible overfitting. Empirical studies indicate that PMT outperforms state-of-the-art MSDA methods on several challenging object detection datasets. Our code is available at https://github.com/imatif17/Prototype-Mean-Teacher.

Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher

TL;DR

This work tackles MSDA for object detection by introducing Prototype-based Mean Teacher (PMT), a memory-efficient method that encodes domain-specific information with class prototypes rather than per-source subnets. It combines a mean-teacher framework for semi-supervised adaptation with a multi-domain discriminator for domain-invariant features and a prototype network using a contrastive loss to align same-class prototypes across domains while separating different classes. The approach yields strong improvements over state-of-the-art MSDA methods across cross-time, cross-camera, and mixed-domain benchmarks, while showing negligible parameter growth as the number of source domains increases. These results suggest PMT provides scalable, robust MSDA for OD with practical impact on real-world deployment where multiple, diverse source domains are available.

Abstract

Adapting visual object detectors to operational target domains is a challenging task, commonly achieved using unsupervised domain adaptation (UDA) methods. Recent studies have shown that when the labeled dataset comes from multiple source domains, treating them as separate domains and performing a multi-source domain adaptation (MSDA) improves the accuracy and robustness over blending these source domains and performing a UDA. For adaptation, existing MSDA methods learn domain-invariant and domain-specific parameters (for each source domain). However, unlike single-source UDA methods, learning domain-specific parameters makes them grow significantly in proportion to the number of source domains. This paper proposes a novel MSDA method called Prototype-based Mean Teacher (PMT), which uses class prototypes instead of domain-specific subnets to encode domain-specific information. These prototypes are learned using a contrastive loss, aligning the same categories across domains and separating different categories far apart. Given the use of prototypes, the number of parameters required for our PMT method does not increase significantly with the number of source domains, thus reducing memory issues and possible overfitting. Empirical studies indicate that PMT outperforms state-of-the-art MSDA methods on several challenging object detection datasets. Our code is available at https://github.com/imatif17/Prototype-Mean-Teacher.
Paper Structure (19 sections, 10 equations, 4 figures, 8 tables)

This paper contains 19 sections, 10 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: A comparison of the MSDA architectures using the mean-teacher method in the case with two source domains. While state-of-the-art methods require domain-specific parameters (typically a detection head for each source domain) for preserving domain information, our method stores domain-specific information using prototypes for each class and domain.
  • Figure 2: Architectural diagram of the proposed Prototype-based Mean Teacher (PMT) for MSDA. Following the mean-teacher framework, the student is trained with backpropagation, while the teacher is an exponential moving average of the student. The student is trained with images from all domains, and feature alignment is performed at both the image and instance levels using a discriminator and a prototype, respectively. During inference, the teacher model is only employed.
  • Figure 3: Prototype-based feature alignment with multiple source domains. There are three domains and three classes. Each domain has a prototype for each class. Initially, there is confusion between classes and the intra-class distance to global prototypes from multiple domains is also large. After alignment, class confusion and intra-class distance to global prototypes are reduced.
  • Figure 4: AP of PMT with a growing number of object categories.