Table of Contents
Fetching ...

Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, Yingmei Wei, Yongxiang Liu

TL;DR

This work tackles cross-domain object detection by addressing confidence misalignment in Mean Teacher-based pseudo labeling. It introduces MGCAMT, a framework that couples three modules—CCA (EDL-based category uncertainty filtering), TCA (interactive cross-scale remapping for regression), and FCA (learning from MT outputs without label assignment)—within a Mean Teacher setup. The approach yields a robust training objective with $L_{total} = L_s + abla L_t$ and an EMA-based teacher update, achieving state-of-the-art results across multiple domain-shift benchmarks and reducing miscalibration at category, instance, and image levels. By aligning confidence across granularities, MGCAMT enhances pseudo supervision quality, improving cross-domain generalization and detection performance in practical settings.

Abstract

Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo label in the training process, will bring suboptimal performance on the target domain. To tackle this issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to obtain high quality pseudo supervision for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, to mitigate the instance-level misalignment between classification and localization, we design Task Confidence Alignment (TCA) to enhance the interaction between the two task branches and allow each classification feature to adaptively locate the optimal feature for the regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to concentrate on holistic information in the target image. These three procedures benefit from each other from a cooperative learning perspective.

Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

TL;DR

This work tackles cross-domain object detection by addressing confidence misalignment in Mean Teacher-based pseudo labeling. It introduces MGCAMT, a framework that couples three modules—CCA (EDL-based category uncertainty filtering), TCA (interactive cross-scale remapping for regression), and FCA (learning from MT outputs without label assignment)—within a Mean Teacher setup. The approach yields a robust training objective with and an EMA-based teacher update, achieving state-of-the-art results across multiple domain-shift benchmarks and reducing miscalibration at category, instance, and image levels. By aligning confidence across granularities, MGCAMT enhances pseudo supervision quality, improving cross-domain generalization and detection performance in practical settings.

Abstract

Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo label in the training process, will bring suboptimal performance on the target domain. To tackle this issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to obtain high quality pseudo supervision for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, to mitigate the instance-level misalignment between classification and localization, we design Task Confidence Alignment (TCA) to enhance the interaction between the two task branches and allow each classification feature to adaptively locate the optimal feature for the regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to concentrate on holistic information in the target image. These three procedures benefit from each other from a cooperative learning perspective.
Paper Structure (20 sections, 17 equations, 9 figures, 6 tables)

This paper contains 20 sections, 17 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Illustration of our multi-granularity confidence alignment method for cross domain object detection. The predictions on the target domain (Foggy Cityscapes) are usually confidence misaligned: (left) the category with high confidence may be incorrect; (middle) the vague and tiny objects usually obtain low confidences or even get omitted; (right) the bounding box with high classification confidence is located inaccurately. From the perspective of alleviating confidence misalignment, we obtain high quality pseudo supervision by category-level CCA, instance-level TCA and image-level FCA to promote teacher-student mutual learning. (Best viewed in color and zooming in.)
  • Figure 2: Overview of the proposed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT). The student detector is optimized on the labeled source data and the unlabeled target data with pseudo labels generated from the teacher detector. The student updates the teacher with exponential moving average (EMA). At category level, CCA adopts Beta-based evidential learning to estimate the category uncertainty of the pseudo labels, and reduces the effect of category overconfidence via an uncertainty-aware selection strategy. At instancel level, TCA enhances the interaction between the two task branches and allows each classification feature to adaptively locate the optimal feature for the regression based on remapping. At image level, FCA leverages the original outputs from the Mean Teacher network for supervised learning without label assignment to focus on holistic information of an image and eliminate the tedious process of pseudo label assignment. In the inference stage, only the convolution operation in TCA increases computational overhead $(\sim 1\%)$, but significantly improves the performance.
  • Figure 3: Classification precision of the predictions on the target domain when the confidence threshold is set to high levels. Meanwhile, we report the recall at the same confidence level. Results on the Foggy Cityscapes are presented. (Best viewed in color.)
  • Figure 4: Confidence of the maximum category and its maximum IoU with the Ground Truth boxes in the corresponding class. Results on the Foggy Cityscapes are presented. (Best viewed in color and zooming in.)
  • Figure 5: Illustration of negative feedback of pseudo label with label assignment. We adjust the confidence threshold to select pseudo label for Mean Teacher learning and we also report the performance of FCA and MGCAMT as a comparison. When the iteration reaches 10,000, pseudo signals participate in the training. Results on the Foggy Cityscapes are presented. (Best viewed in color.)
  • ...and 4 more figures