Table of Contents
Fetching ...

Mitigating Context Bias in Domain Adaptation for Object Detection using Mask Pooling

Hojun Son, Asma Almutairi, Arpan Kusari

TL;DR

This work addresses context bias in domain adaptation for object detection (DAOD) by examining pooling as a potential source of FG-BG spurious associations. It introduces Mask Pooling, which uses foreground masks to split pooling into FG and BG regions, framed within a causal-analysis framework to evaluate and mitigate the influence of context through interventions. The method is tested across Cityscapes, KITTI, Virtual KITTI, and a BG-20K benchmark with random backgrounds, showing robust improvements in hierarchical F1 and mAP50 under domain shifts. Overall, Mask Pooling provides a principled, causally motivated approach to reduce context bias in DAOD with practical benefits for generalization to unseen domains.

Abstract

Context bias refers to the association between the foreground objects and background during the object detection training process. Various methods have been proposed to minimize the context bias when applying the trained model to an unseen domain, known as domain adaptation for object detection (DAOD). But a principled approach to understand why the context bias occurs and how to remove it has been missing. In this work, we provide a causal view of the context bias, pointing towards the pooling operation in the convolution network architecture as the possible source of this bias. We present an alternative, Mask Pooling, which uses an additional input of foreground masks, to separate the pooling process in the respective foreground and background regions and show that this process leads the trained model to detect objects in a more robust manner under different domains. We also provide a benchmark designed to create an ultimate test for DAOD, using foregrounds in the presence of absolute random backgrounds, to analyze the robustness of the intended trained models. Through these experiments, we hope to provide a principled approach for minimizing context bias under domain shift.

Mitigating Context Bias in Domain Adaptation for Object Detection using Mask Pooling

TL;DR

This work addresses context bias in domain adaptation for object detection (DAOD) by examining pooling as a potential source of FG-BG spurious associations. It introduces Mask Pooling, which uses foreground masks to split pooling into FG and BG regions, framed within a causal-analysis framework to evaluate and mitigate the influence of context through interventions. The method is tested across Cityscapes, KITTI, Virtual KITTI, and a BG-20K benchmark with random backgrounds, showing robust improvements in hierarchical F1 and mAP50 under domain shifts. Overall, Mask Pooling provides a principled, causally motivated approach to reduce context bias in DAOD with practical benefits for generalization to unseen domains.

Abstract

Context bias refers to the association between the foreground objects and background during the object detection training process. Various methods have been proposed to minimize the context bias when applying the trained model to an unseen domain, known as domain adaptation for object detection (DAOD). But a principled approach to understand why the context bias occurs and how to remove it has been missing. In this work, we provide a causal view of the context bias, pointing towards the pooling operation in the convolution network architecture as the possible source of this bias. We present an alternative, Mask Pooling, which uses an additional input of foreground masks, to separate the pooling process in the respective foreground and background regions and show that this process leads the trained model to detect objects in a more robust manner under different domains. We also provide a benchmark designed to create an ultimate test for DAOD, using foregrounds in the presence of absolute random backgrounds, to analyze the robustness of the intended trained models. Through these experiments, we hope to provide a principled approach for minimizing context bias under domain shift.

Paper Structure

This paper contains 20 sections, 1 equation, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Left panel: (1) shows the ideal case where the label "Y" is dependent only on the FG ("F"); (2) presents the actual case where the FG ("F") and BG ("B") gets associated in "A" and influences the prediction; (3) presents our proposed model where the mask pooling removes the association by making the pooling operation separately in FG and BG. Right panel: It shows an example activation map with max pooling and mask pooling to show the difference between the pooling techniques.
  • Figure 2: Radar chart of mAP50 on Cityscapes and Virtual KITTI. It is normalized by 100.