Table of Contents
Fetching ...

Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors

Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger

TL;DR

Domain shifts hamper object detectors, and multi-source domain adaptation (MSDA) offers a remedy by leveraging multiple labeled sources and unlabeled targets. The paper introduces ACIA, an attention-based class-conditioned alignment method that integrates class information into ROI-pooled instance features via a transformer-style attention block within a Mean-Teacher framework, coupled with image-level multi-class and instance-level discriminators trained through gradient reversal. Results on cross-time, cross-camera, and mixed-domain MSDA benchmarks show state-of-the-art performance and robustness to class imbalance, outperforming prototype-based class-conditioned methods while avoiding pseudo-label accumulation issues. The approach is parameter-efficient (no domain-specific parameters) and achieves strong practical impact for robust multi-source object detection.

Abstract

Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.

Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors

TL;DR

Domain shifts hamper object detectors, and multi-source domain adaptation (MSDA) offers a remedy by leveraging multiple labeled sources and unlabeled targets. The paper introduces ACIA, an attention-based class-conditioned alignment method that integrates class information into ROI-pooled instance features via a transformer-style attention block within a Mean-Teacher framework, coupled with image-level multi-class and instance-level discriminators trained through gradient reversal. Results on cross-time, cross-camera, and mixed-domain MSDA benchmarks show state-of-the-art performance and robustness to class imbalance, outperforming prototype-based class-conditioned methods while avoiding pseudo-label accumulation issues. The approach is parameter-efficient (no domain-specific parameters) and achieves strong practical impact for robust multi-source object detection.

Abstract

Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.
Paper Structure (19 sections, 6 equations, 7 figures, 14 tables)

This paper contains 19 sections, 6 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: A comparison of alignment strategies of different MSDA methods. (a) DMSN and TRKP implement pairwise alignment of source-target pair without considering class-wise alignment. (b) PMT learns specific prototypes to represent each class and domain. Then, the same class prototypes from different domains are merged into class-conditioned domain-invariant prototypes. (c) In contrast, our ACIA learns instance-level domain invariant features by conditioning an attention module to attend a given class.
  • Figure 2: T-SNE projection of the class distributions of source data (BDD100K Daytime) and target data (BDD100K Dusk/Dawn). (a) Projection without class-specific alignment as in DMSN and TRKP. The classes are not aligned between the two domains. (b) PMT relies on prototypes for class alignment. Most classes are well aligned except for the bike since it is underrepresented. (c) Our ACIA uses an attention-based adversarial class alignment and manages to align all classes.
  • Figure 3: Overview of the training architecture of our ACIA. The overall architecture is a mean-teacher model in which the student learns from multiple sources of data. An image-level classifier is introduced to globally align features, and an instance classifier to align instances in a class-conditional way, through a transformer-based attention block (see white boxes).
  • Figure 4: Examples of ODs on the BDD100k cross-time setting, with different types of instance-level adaptations. (a) ODs with a multi-source adaptation without instance level class-conditional adaptation as in dmsntrkp. (b) ODs with the prototype-based class-conditional adaptation pmt. (c) ACIA: ODs with our attention-based class-conditional adaptation.
  • Figure 5: Heatmap showing the activation of each class embedding with instance features of some object categories. The X-axis represents the list of all the class embedding corresponding to all the object categories, and the Y-axis represents the object category present in the ROI-Pooled feature.
  • ...and 2 more figures