CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan
TL;DR
This work tackles domain adaptive object detection under severe class imbalance by introducing Class-Aware Teacher (CAT). CAT learns inter-class biases with an Inter-Class Relation module (ICRm), augments minority representation via Class-Relation Augmentation (CRA) using MixUp on related class crops stored in a Cropbank, and applies an Inter-Class Loss (ICL) to emphasize hard minority cases. The approach yields state-of-the-art results on Cityscapes to Foggy Cityscapes (52.5 mAP, up from 51.2) and strong gains on PASCAL VOC to Clipart1K, with ablations confirming the efficacy of each component. By explicitly modeling inter-class dynamics and cross-domain augmentation, CAT provides a principled framework to mitigate minority class bias in DAOD with practical impact for real-world domain shifts.
Abstract
Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, especially where one class is a majority and the other minority, has a large impact on class bias. We propose Class-Aware Teacher (CAT) to address the class bias issue in the domain adaptation setting. In our work, we approximate the class relationships with our Inter-Class Relation module (ICRm) and exploit it to reduce the bias within the model. In this way, we are able to apply augmentations to highly related classes, both inter- and intra-domain, to boost the performance of minority classes while having minimal impact on majority classes. We further reduce the bias by implementing a class-relation weight to our classification loss. Experiments conducted on various datasets and ablation studies show that our method is able to address the class bias in the domain adaptation setting. On the Cityscapes to Foggy Cityscapes dataset, we attained a 52.5 mAP, a substantial improvement over the 51.2 mAP achieved by the state-of-the-art method.
