Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization
Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Wei Yang, Yi Yang, Jun Xiao
TL;DR
This work targets biased activation in weakly-supervised object localization by revealing background confounders through a structural causal lens and addressing them with Counterfactual Co-occurring Learning (CCL). The authors introduce Counterfactual-CAM, which disentangles foreground $F$ from background $B$, synthesizes counterfactual representations by pairing $F$ with diverse backgrounds, and trains to align predictions across original and counterfactual views. A decoupled loss and a test-time counterfactual adaptation scheme further reinforce foreground emphasis and robustness across benchmarks, including CUB-200-2011, ILSVRC 2016, and OpenImages30k. The approach yields significant improvements in localization and segmentation (e.g., higher Top-1 Cls/Loc scores and PxAP) while maintaining practical computational overhead, illustrating the practicality and effectiveness of counterfactual reasoning for bias mitigation in WSOL.
Abstract
Contemporary weakly-supervised object localization (WSOL) methods have primarily focused on addressing the challenge of localizing the most discriminative region while largely overlooking the relatively less explored issue of biased activation -- incorrectly spotlighting co-occurring background with the foreground feature. In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation. Based on our analysis, we attribute this phenomenon to the presence of co-occurring background confounders. Building upon this profound insight, we introduce a pioneering paradigm known as Counterfactual Co-occurring Learning (CCL), meticulously engendering counterfactual representations by adeptly disentangling the foreground from the co-occurring background elements. Furthermore, we propose an innovative network architecture known as Counterfactual-CAM. This architecture seamlessly incorporates a perturbation mechanism for counterfactual representations into the vanilla CAM-based model. By training the WSOL model with these perturbed representations, we guide the model to prioritize the consistent foreground content while concurrently reducing the influence of distracting co-occurring backgrounds. To the best of our knowledge, this study represents the initial exploration of this research direction. Our extensive experiments conducted across multiple benchmarks validate the effectiveness of the proposed Counterfactual-CAM in mitigating biased activation.
