Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling
TL;DR
This work tackles the challenge of domain shift in object detection by introducing Unified Multi-Granularity Alignment (MGA), which encodes dependencies across pixel-, instance-, and category-level features to learn domain-invariant representations. It fuses multi-scale information through an Omni-Scale Gated Fusion (OSGF) module, and enforces alignment via multi-granularity discriminators, including a novel category-level discriminator that leverages pseudo labels. A dynamic AEMA strategy further improves pseudo-label quality and mitigates local misalignment, boosting robustness across detectors (FCOS and Faster R-CNN) and diverse domain shifts. Experiments on Cityscapes/FoggyCityscapes, Sim10k/CITY, and other benchmarks show MGA consistently outperforms state-of-the-art UDA detectors, validating its effectiveness and generality. The approach offers practical impact by enabling more reliable cross-domain object detection without target-domain labels, with publicly released code.
Abstract
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
