Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

Jeonghoon Park; Chaeyeon Chung; Juyoung Lee; Jaegul Choo

Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

Jeonghoon Park, Chaeyeon Chung, Juyoung Lee, Jaegul Choo

TL;DR

This work tackles dataset bias in image classification by forcing models to rely on intrinsic features rather than spuriously correlated attributes. It introduces a bias-contrastive training paradigm that uses a bias-negative auxiliary input to reveal class-discriminative intrinsic features, guided by an intrinsic feature enhancement (IE) weight. The BN score enables constructing a bias-negative dataset without bias labels, forming bias-contrastive pairs that steer the debiased model to focus on intrinsic regions. Empirical results on synthetic and real-world bias benchmarks demonstrate state-of-the-art debiasing performance and provide both qualitative and ablation evidence of the method’s effectiveness and robustness.

Abstract

In the image classification task, deep neural networks frequently rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias, resulting in degraded performance when applied to data without bias attributes. The task of debiasing aims to compel classifiers to learn intrinsic attributes that inherently define a target class rather than focusing on bias attributes. While recent approaches mainly focus on emphasizing the learning of data samples without bias attributes (i.e., bias-conflicting samples) compared to samples with bias attributes (i.e., bias-aligned samples), they fall short of directly guiding models where to focus for learning intrinsic features. To address this limitation, this paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. We first identify the intrinsic features by investigating the class-discerning common features between a bias-aligned (BA) sample and a bias-conflicting (BC) sample (i.e., bias-contrastive pair). Next, we enhance the intrinsic features in the BA sample that are relatively under-exploited for prediction compared to the BC sample. To construct the bias-contrastive pair without using bias information, we introduce a bias-negative score that distinguishes BC samples from BA samples employing a biased model. The experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.

Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

TL;DR

Abstract

Paper Structure (28 sections, 14 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 28 sections, 14 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Related work
Methodology
Overview
Constructing bias-negative dataset
Intrinsic feature enhancement
Training with intrinsic feature guidance
Experiments
Experimental settings
Comparison to previous works
Analysis of BN score
Analysis of intrinsic feature guidance
Ablation study
Conclusion
Additional analysis of the BN score as a loss weight
...and 13 more sections

Figures (9)

Figure 1: Overview of our method. We provide explicit spatial guidance $g(\mathbf{z})$ for a debiased model $f_d$, which is described with $f_d^\text{emb}$ and $f_d^\text{cls}$, to learn intrinsic features. To achieve this, we leverage a bias-contrastive pair, $\mathbf{x}$ and $\mathbf{x}^\text{BN}$ from the same target class $y$. $g(\mathbf{z})$ highlights intrinsic features that are relatively under-exploited in $\mathbf{z}$ compared to $\mathbf{z}^\text{BN}$, calculated by common feature score $c$ and relative-exploitation score $r$. Here, we mainly adopt BC samples from $\mathcal{D}^\text{BN}_\text{cand}$ to construct $\mathcal{D}^\text{BN}$, where we sample $\mathbf{x}^\text{BN}$. $\mathcal{D}^\text{BN}$ is updated every iteration using the BN score $S$, which is also updated every iteration. At the inference, we only use $f_d$ in the gray-colored area.
Figure 2: Visualization of BN scores of the samples in $\mathcal{D}^\text{BN}_\text{cand}$ during the training. The red lines and the blue lines indicate the BN scores of BA and BC samples, respectively.
Figure 3: Visualization of the spatial guidance using (a) Waterbirds and (b) BAR dataset. Given bias-contrastive pairs, $\mathbf{x}$ and $\mathbf{x}^\text{BN}$, $\text{E}(\mathbf{z})$ indicates the regions originally focused on by $f_d$ and $\text{IE}(\mathbf{z})$ shows the regions highlighted by our IE weight.
Figure 4: Comparison of the region focused by a debiased model trained with and without our method. We compare Grad-CAM results on the test set of (a) Waterbirds and (b) BAR.
Figure 5: The distributions of $f_b$'s classification loss of samples in $\mathcal{D}^\text{BN}_\text{cand}$. The red and blue lines denote the losses of BA and BC samples, respectively. The dotted and solid lines indicate the losses at the early and later stages of the training, respectively. Best viewed in color.
...and 4 more figures

Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

TL;DR

Abstract

Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair

Authors

TL;DR

Abstract

Table of Contents

Figures (9)