Table of Contents
Fetching ...

Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization

Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Wei Yang, Yi Yang, Jun Xiao

TL;DR

This work targets biased activation in weakly-supervised object localization by revealing background confounders through a structural causal lens and addressing them with Counterfactual Co-occurring Learning (CCL). The authors introduce Counterfactual-CAM, which disentangles foreground $F$ from background $B$, synthesizes counterfactual representations by pairing $F$ with diverse backgrounds, and trains to align predictions across original and counterfactual views. A decoupled loss and a test-time counterfactual adaptation scheme further reinforce foreground emphasis and robustness across benchmarks, including CUB-200-2011, ILSVRC 2016, and OpenImages30k. The approach yields significant improvements in localization and segmentation (e.g., higher Top-1 Cls/Loc scores and PxAP) while maintaining practical computational overhead, illustrating the practicality and effectiveness of counterfactual reasoning for bias mitigation in WSOL.

Abstract

Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation. Based on our analysis, we attribute this phenomenon to the presence of co-occurring background confounders. Building upon this profound insight, we introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL), meticulously engendering counterfactual representations by adeptly disentangling the foreground from the co-occurring background elements. Furthermore, we propose an innovative network architecture known as Counterfactual-CAM. This architecture seamlessly incorporates a perturbation mechanism for counterfactual representations into the vanilla CAM-based model. By training the WSOL model with these perturbed representations, we guide the model to prioritize the consistent foreground content while concurrently reducing the influence of distracting co-occurring backgrounds. To the best of our knowledge, this study represents the initial exploration of this research direction. Our extensive experiments conducted across multiple benchmarks validate the effectiveness of the proposed Counterfactual-CAM in mitigating biased activation.

Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization

TL;DR

This work targets biased activation in weakly-supervised object localization by revealing background confounders through a structural causal lens and addressing them with Counterfactual Co-occurring Learning (CCL). The authors introduce Counterfactual-CAM, which disentangles foreground from background , synthesizes counterfactual representations by pairing with diverse backgrounds, and trains to align predictions across original and counterfactual views. A decoupled loss and a test-time counterfactual adaptation scheme further reinforce foreground emphasis and robustness across benchmarks, including CUB-200-2011, ILSVRC 2016, and OpenImages30k. The approach yields significant improvements in localization and segmentation (e.g., higher Top-1 Cls/Loc scores and PxAP) while maintaining practical computational overhead, illustrating the practicality and effectiveness of counterfactual reasoning for bias mitigation in WSOL.

Abstract

Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation. Based on our analysis, we attribute this phenomenon to the presence of co-occurring background confounders. Building upon this profound insight, we introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL), meticulously engendering counterfactual representations by adeptly disentangling the foreground from the co-occurring background elements. Furthermore, we propose an innovative network architecture known as Counterfactual-CAM. This architecture seamlessly incorporates a perturbation mechanism for counterfactual representations into the vanilla CAM-based model. By training the WSOL model with these perturbed representations, we guide the model to prioritize the consistent foreground content while concurrently reducing the influence of distracting co-occurring backgrounds. To the best of our knowledge, this study represents the initial exploration of this research direction. Our extensive experiments conducted across multiple benchmarks validate the effectiveness of the proposed Counterfactual-CAM in mitigating biased activation.
Paper Structure (22 sections, 8 equations, 6 figures, 8 tables)

This paper contains 22 sections, 8 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Given an input image, we visualize the foreground detected by the vanilla CAM and Counterfactual-CAM, respectively, as well as the complementary background decoupled from Counterfactual-CAM. The pink labels and yellow arrows indicate the incorrect prediction category and the regions suffering from "biased activation", respectively.
  • Figure 2: (a) Building the structural causal model (SCM) in WSOL. (b) Cutting off the confounding effect of $B \rightarrow Y$ in WSOL. (c) Synthesizing counterfactual representations to remove the confounding effect of $B \rightarrow Y$. $O$: original image feature. $F$: foreground feature, $f_1 \in F$. $B$: background feature, $b_1, b_2, b_3 \in B$. $FB$: synthesized counterfactual representation. $Y$: image label, $y_1 \in Y$.
  • Figure 3: Overview of the proposed Counterfactual-CAM. (a) The learning process of Counterfactual-CAM. $d$ denotes the length of the prototype feature. (b) Decoupling original image feature to foreground feature and background feature. (c) Synthesizing counterfactual representations by pairing each foreground feature and various background features.
  • Figure 4: (a) Comparison of the prediction of the original image $O$, foreground $F$, and adaptation. (b) Overview of test-time adaptation, which finetunes the BN layers, foreground extractor, and classifier.
  • Figure 5: Qualitative object localization results compared with the baseline method on the CUB-200-2011 dataset. The predicted bounding boxes are in green, and the ground-truth boxes are in red.
  • ...and 1 more figures