Table of Contents
Fetching ...

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

TL;DR

This work tackles the challenge of single-domain generalized object detection by introducing a twofold strategy: (1) diversify the single source domain through a curated set of augmentations, including ImageNet-C and Fourier-based perturbations, to reduce reliance on domain-specific cues; (2) align detections across original and augmented views by jointly optimizing classification and localization consistency, producing robust and well-calibrated detectors. The method is detector-agnostic and improves both two-stage and one-stage detectors, achieving substantial gains in unseen-domain mAP and better calibration measured by D-ECE, across diverse shifts such as real-to-artistic and multi-weather urban scenes. The authors validate their approach with extensive experiments and ablations, demonstrating that aligning both classification and localization across diversified views yields additive benefits beyond diversification alone. A public code release accompanies the work, signaling practical impact for real-world deployments where domain shifts are common and labeled target data are unavailable.

Abstract

In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

TL;DR

This work tackles the challenge of single-domain generalized object detection by introducing a twofold strategy: (1) diversify the single source domain through a curated set of augmentations, including ImageNet-C and Fourier-based perturbations, to reduce reliance on domain-specific cues; (2) align detections across original and augmented views by jointly optimizing classification and localization consistency, producing robust and well-calibrated detectors. The method is detector-agnostic and improves both two-stage and one-stage detectors, achieving substantial gains in unseen-domain mAP and better calibration measured by D-ECE, across diverse shifts such as real-to-artistic and multi-weather urban scenes. The authors validate their approach with extensive experiments and ablations, demonstrating that aligning both classification and localization across diversified views yields additive benefits beyond diversification alone. A public code release accompanies the work, signaling practical impact for real-world deployments where domain shifts are common and labeled target data are unavailable.

Abstract

In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign
Paper Structure (10 sections, 6 equations, 10 figures, 13 tables)

This paper contains 10 sections, 6 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Overall architecture of our proposed method. At the core is a baseline detector, Here a two-stage detector Faster-RCNNren2015faster is depicted, comprising of backbone, region proposal network (RPN), and ROI alignment (RA). To improve the single domain generalization of the baseline detector, we propose to diversify the single source domain and also align the diversified views by minimizing losses at both classification and regression outputs.
  • Figure 2: Examples of augmentations for domain diversification.
  • Figure 3: We augment the validation set from the source domain by one augmentation at a time and report the performance of the strong baseline model trained on all these augmentations. There is noticeable gap between the performance on original and diversified images. Our alignment losses allows reducing the gap between the performance on original and diversified images.
  • Figure 4: Our method (diversification and alignment) results in both considerable improvement in domain generalization and out-of-domain calibration. Diversification with the Label Smoothing (LS) or Temperature Scaling (TS) improves calibration but overall lower mAP indicates lacking in generalization. Note that our method does not have an explicit model calibration mechanism. (left) mAP: higher the better, (right) D-ECE: lower the better.
  • Figure 5: Reliability Diagram for different target domains.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 3.1: Domain Invariance for Object Detection