Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Jeonghoon Park, Chaeyeon Chung, Juyoung Lee, Jaegul Choo
TL;DR
This work tackles dataset bias in image classification by forcing models to rely on intrinsic features rather than spuriously correlated attributes. It introduces a bias-contrastive training paradigm that uses a bias-negative auxiliary input to reveal class-discriminative intrinsic features, guided by an intrinsic feature enhancement (IE) weight. The BN score enables constructing a bias-negative dataset without bias labels, forming bias-contrastive pairs that steer the debiased model to focus on intrinsic regions. Empirical results on synthetic and real-world bias benchmarks demonstrate state-of-the-art debiasing performance and provide both qualitative and ablation evidence of the method’s effectiveness and robustness.
Abstract
In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias attributes. While recent approaches mainly focus on emphasizing the learning of data samples without bias attributes (i.e., bias-conflicting samples) compared to samples with bias attributes (i.e., bias-aligned samples), they fall short of directly guiding models where to focus for learning intrinsic features. To address this limitation, this paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. We first identify the intrinsic features by investigating the class-discerning common features between a bias-aligned (BA) sample and a bias-conflicting (BC) sample (i.e., bias-contrastive pair). Next, we enhance the intrinsic features in the BA sample that are relatively under-exploited for prediction compared to the BC sample. To construct the bias-contrastive pair without using bias information, we introduce a bias-negative score that distinguishes BC samples from BA samples employing a biased model. The experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.
