Table of Contents
Fetching ...

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling

TL;DR

This work tackles the challenge of domain shift in object detection by introducing Unified Multi-Granularity Alignment (MGA), which encodes dependencies across pixel-, instance-, and category-level features to learn domain-invariant representations. It fuses multi-scale information through an Omni-Scale Gated Fusion (OSGF) module, and enforces alignment via multi-granularity discriminators, including a novel category-level discriminator that leverages pseudo labels. A dynamic AEMA strategy further improves pseudo-label quality and mitigates local misalignment, boosting robustness across detectors (FCOS and Faster R-CNN) and diverse domain shifts. Experiments on Cityscapes/FoggyCityscapes, Sim10k/CITY, and other benchmarks show MGA consistently outperforms state-of-the-art UDA detectors, validating its effectiveness and generality. The approach offers practical impact by enabling more reliable cross-domain object detection without target-domain labels, with publicly released code.

Abstract

Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

TL;DR

This work tackles the challenge of domain shift in object detection by introducing Unified Multi-Granularity Alignment (MGA), which encodes dependencies across pixel-, instance-, and category-level features to learn domain-invariant representations. It fuses multi-scale information through an Omni-Scale Gated Fusion (OSGF) module, and enforces alignment via multi-granularity discriminators, including a novel category-level discriminator that leverages pseudo labels. A dynamic AEMA strategy further improves pseudo-label quality and mitigates local misalignment, boosting robustness across detectors (FCOS and Faster R-CNN) and diverse domain shifts. Experiments on Cityscapes/FoggyCityscapes, Sim10k/CITY, and other benchmarks show MGA consistently outperforms state-of-the-art UDA detectors, validating its effectiveness and generality. The approach offers practical impact by enabling more reliable cross-domain object detection without target-domain labels, with publicly released code.

Abstract

Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
Paper Structure (28 sections, 16 equations, 13 figures, 14 tables)

This paper contains 28 sections, 16 equations, 13 figures, 14 tables.

Figures (13)

  • Figure 1: Illustration of the proposed Multi-Granularity Alignment (MGA) framework for domain adaptive object detection. Specifically, MGA encodes the dependencies across multiple granularities simultaneously, including pixel-, instance-, and category-levels. In addition, a dynamic update mechanism guided by update factor $\delta$ (as detailed later) through model assessment during training is used to improve the quality of pseudo labels and meanwhile mitigate the local misalignment problem, further enhancing the detection robustness. Best viewed in color and by zooming in for all figures throughout this paper.
  • Figure 2: Framework of our MGA on the top of popular anchor-free FCOS DBLP:conf/iccv/TianSCH19 (see left image (a)) and anchor-based Faster R-CNN DBLP:conf/cvpr/Ren0ZPC0018 (see right image (b)) for UDA detection with assessment-based AEMA. Note that for Faster R-CNN, the region proposal network (RPN) and the RoI head are used for coarse detection and final detection, respectively. $F$ in (a), $F_1$ and $F_2$ in (b) represent the features from the feature pyramid network in FCOS and backbone in Faster R-CNN.
  • Figure 3: Illustration of the proposed omni-scale gated fusion (OSGF) module for anchor-free FCOS DBLP:conf/iccv/TianSCH19 (see left image (a)) and anchor-based Faster R-CNN DBLP:conf/cvpr/Ren0ZPC0018 (see right image (b)). The parameters of the modules with the same color are shared.
  • Figure 4: Illustration of different category-level discriminators $D$. $s_c$ and $t_c$ are the $c$-th category ($c=0,1,\cdots,C-1$) in source and target domains respectively. (a) Category-specific discriminators for each category DBLP:conf/iccv/DuTYFXZYZ19DBLP:conf/cvpr/HuKSC20DBLP:conf/eccv/PaulTSRC20. (b) Domain-consistent discriminator to distinguish different categories within one domain DBLP:conf/eccv/WangSZD020. (c) Our category- and domain-consistent discriminator to consider both instance discriminability in different categories and category consistency between two domains.
  • Figure 5: Comparison of the pseudo label quality between EMA and AEMA. Image (a) displays the pseudo label generated by EMA, image (b) the pseudo label by our AEMA, and image (c) the GT pseudo label. In image (d), we show mAP scores of the generated pseudo labels of different strategies, and we can see that AEMA produces better pseudo labels.
  • ...and 8 more figures