Table of Contents
Fetching ...

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

Yajing Liu, Shijun Zhou, Xiyao Liu, Chunhui Hao, Baojie Fan, Jiandong Tian

TL;DR

This work tackles single-source domain generalization for object detection by modeling data and feature biases through a Structural Causal Model and engineering a causal learning framework. The Unbiased Faster R-CNN (UFR) combines a Global-Local Transformation for data augmentation with a Causal Attention Learning module and a Causal Prototype Learning module to encourage image- and object-level causal representations. Empirical results across five weather conditions demonstrate improved generalization, notably a $3.9$ percentage-point $mAP$ gain on the Night-Clear scene, outperforming domain-invariant and augmentation-based baselines. The approach offers a causal, feature-level mechanism to robustly detect objects under distribution shift, with potential for more reliable real-world perception systems.

Abstract

Single-source domain generalization (SDG) for object detection is a challenging yet essential task as the distribution bias of the unseen domain degrades the algorithm performance significantly. However, existing methods attempt to extract domain-invariant features, neglecting that the biased data leads the network to learn biased features that are non-causal and poorly generalizable. To this end, we propose an Unbiased Faster R-CNN (UFR) for generalizable feature learning. Specifically, we formulate SDG in object detection from a causal perspective and construct a Structural Causal Model (SCM) to analyze the data bias and feature bias in the task, which are caused by scene confounders and object attribute confounders. Based on the SCM, we design a Global-Local Transformation module for data augmentation, which effectively simulates domain diversity and mitigates the data bias. Additionally, we introduce a Causal Attention Learning module that incorporates a designed attention invariance loss to learn image-level features that are robust to scene confounders. Moreover, we develop a Causal Prototype Learning module with an explicit instance constraint and an implicit prototype constraint, which further alleviates the negative impact of object attribute confounders. Experimental results on five scenes demonstrate the prominent generalization ability of our method, with an improvement of 3.9% mAP on the Night-Clear scene.

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

TL;DR

This work tackles single-source domain generalization for object detection by modeling data and feature biases through a Structural Causal Model and engineering a causal learning framework. The Unbiased Faster R-CNN (UFR) combines a Global-Local Transformation for data augmentation with a Causal Attention Learning module and a Causal Prototype Learning module to encourage image- and object-level causal representations. Empirical results across five weather conditions demonstrate improved generalization, notably a percentage-point gain on the Night-Clear scene, outperforming domain-invariant and augmentation-based baselines. The approach offers a causal, feature-level mechanism to robustly detect objects under distribution shift, with potential for more reliable real-world perception systems.

Abstract

Single-source domain generalization (SDG) for object detection is a challenging yet essential task as the distribution bias of the unseen domain degrades the algorithm performance significantly. However, existing methods attempt to extract domain-invariant features, neglecting that the biased data leads the network to learn biased features that are non-causal and poorly generalizable. To this end, we propose an Unbiased Faster R-CNN (UFR) for generalizable feature learning. Specifically, we formulate SDG in object detection from a causal perspective and construct a Structural Causal Model (SCM) to analyze the data bias and feature bias in the task, which are caused by scene confounders and object attribute confounders. Based on the SCM, we design a Global-Local Transformation module for data augmentation, which effectively simulates domain diversity and mitigates the data bias. Additionally, we introduce a Causal Attention Learning module that incorporates a designed attention invariance loss to learn image-level features that are robust to scene confounders. Moreover, we develop a Causal Prototype Learning module with an explicit instance constraint and an implicit prototype constraint, which further alleviates the negative impact of object attribute confounders. Experimental results on five scenes demonstrate the prominent generalization ability of our method, with an improvement of 3.9% mAP on the Night-Clear scene.
Paper Structure (16 sections, 14 equations, 9 figures, 6 tables)

This paper contains 16 sections, 14 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Comparison between vanilla Faster R-CNN (FR) Ren2015Faster (top) and our proposed Unbiased Faster R-CNN (bottom). For vanilla FR Ren2015Faster, the biased distribution of the input data leads the network to learn biased features that favor the seen environment and are poorly generalizable to unseen test environments. The feature bias can be attributed to the image-level attention bias and object-level prototype bias. Our method mitigates the data bias in the input space and further learns unbiased attention and prototypes in the representation space.
  • Figure 2: Illustration of highly changeable data distribution, diverse context and object attributes in unseen target domains.
  • Figure 3: The constructed Structural Causal Model (SCM) for the object detection task. The nodes denote variables, the solid arrows denote the direct causal effect and the dashed arrow indicates that there exists data dependence.
  • Figure 4: The overall structure of the proposed Unbiased Faster R-CNN. The input source images are augmented through the Global-Local Transformation (GLT) module. Both the original images and augmented images are fed into the network for training. During training, the role of the Causal Attention Learning module is to constrain the network to learn scene-level causal attention and select causal features to feed into the RPN. The purpose of the Causal Prototype Learning module is to constrain the network to learn object-level causal features with the help of an explicit instance constraint (solid arrows) and an implicit prototype constraint (dashed arrows).
  • Figure 5: Overview of the Global-Local Transformation (GLT) module. The Global Transformation (GT) performs overall augmentation of an image in the frequency domain. And The Local-Transformation (LT) performs augmentation of local objects obtained by SAM kirillov2023segment in the spatial domain. The final augmented image is obtained by fusing the GT image with the LT image.
  • ...and 4 more figures