Table of Contents
Fetching ...

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective

Yeonsung Jung, Jaeyun Song, June Yong Yang, Jin-Hwa Kim, Sung-Yub Kim, Eunho Yang

TL;DR

Inspired by the similarities between mislabeled samples and bias-conflicting samples, this work delves into Influence Function, one of the standard methods for mislabeled sample detection, for identifying bias-conflicting samples and proposes a simple yet effective remedy for biased models by leveraging them.

Abstract

Learning generalized models from biased data is an important undertaking toward fairness in deep learning. To address this issue, recent studies attempt to identify and leverage bias-conflicting samples free from spurious correlations without prior knowledge of bias or an unbiased set. However, spurious correlation remains an ongoing challenge, primarily due to the difficulty in precisely detecting these samples. In this paper, inspired by the similarities between mislabeled samples and bias-conflicting samples, we approach this challenge from a novel perspective of mislabeled sample detection. Specifically, we delve into Influence Function, one of the standard methods for mislabeled sample detection, for identifying bias-conflicting samples and propose a simple yet effective remedy for biased models by leveraging them. Through comprehensive analysis and experiments on diverse datasets, we demonstrate that our new perspective can boost the precision of detection and rectify biased models effectively. Furthermore, our approach is complementary to existing methods, showing performance improvement even when applied to models that have already undergone recent debiasing techniques.

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective

TL;DR

Inspired by the similarities between mislabeled samples and bias-conflicting samples, this work delves into Influence Function, one of the standard methods for mislabeled sample detection, for identifying bias-conflicting samples and proposes a simple yet effective remedy for biased models by leveraging them.

Abstract

Learning generalized models from biased data is an important undertaking toward fairness in deep learning. To address this issue, recent studies attempt to identify and leverage bias-conflicting samples free from spurious correlations without prior knowledge of bias or an unbiased set. However, spurious correlation remains an ongoing challenge, primarily due to the difficulty in precisely detecting these samples. In this paper, inspired by the similarities between mislabeled samples and bias-conflicting samples, we approach this challenge from a novel perspective of mislabeled sample detection. Specifically, we delve into Influence Function, one of the standard methods for mislabeled sample detection, for identifying bias-conflicting samples and propose a simple yet effective remedy for biased models by leveraging them. Through comprehensive analysis and experiments on diverse datasets, we demonstrate that our new perspective can boost the precision of detection and rectify biased models effectively. Furthermore, our approach is complementary to existing methods, showing performance improvement even when applied to models that have already undergone recent debiasing techniques.

Paper Structure

This paper contains 47 sections, 7 equations, 14 figures, 16 tables, 2 algorithms.

Figures (14)

  • Figure 1: Precision of detecting bias-conflicting samples among Loss, Gradient Norm, Influence function on training set (IFtrain), and Self-Influence (SI). The precision is evaluated with the ground truth number of bias-conflicting samples. The average precision of loss value, gradient norm, SI, and IF are presented in bars across three runs.
  • Figure 2: The overview of our method. We compute Bias-Conditioned Self-Influence (BCSI) of the training data and construct a small but concentrated pivotal set with a high ratio of bias-conflicting samples. Then, we remedy biased models through fine-tuning that utilizes the pivotal set and remaining samples.
  • Figure 3: A comprehensive analysis of Influence function on the training set (IFtrain) and Self-Influence (SI) in biased datasets. Figure \ref{['fig:epoch_alignconflict']} shows the classification accuracy of bias-aligned and bias-conflicting samples over training epochs. Figure \ref{['fig:if-ce-epoch']} and \ref{['fig:si-ce-epoch']} depict the detection precision of IFtrain and SI across training epochs for varying ratios of bias-conflicting samples in CIFAR10C. Figure \ref{['fig:cifar10c-1pct-ce-histogram']} shows histograms of sample distribution in CIFAR10C (1%) and each bar indicates the number of samples within a specific range.
  • Figure 4: Comparison of average precision between SI and BCSI across diverse datasets over three runs.
  • Figure 5: Example images from BFFHQ ranked within the top 100 by BCSI score. (a) and (b) are bias-conflicting samples with high and relatively lower BCSI scores, respectively. (c) is a bias-aligned sample with a high BCSI score, while (d) is a bias-aligned sample with a low BCSI score.
  • ...and 9 more figures