Table of Contents
Fetching ...

Detecting labeling bias using influence functions

Frida Jørgensen, Nina Weng, Siavash Bigdeli

TL;DR

Promising results are demonstrated, successfully detecting nearly 90% of mislabeled samples in MNIST and on CheXpert, where mislabeled samples consistently exhibit higher influence scores, highlighting the potential of influence functions for identifying label errors.

Abstract

Labeling bias arises during data collection due to resource limitations or unconscious bias, leading to unequal label error rates across subgroups or misrepresentation of subgroup prevalence. Most fairness constraints assume training labels reflect the true distribution, rendering them ineffective when labeling bias is present; leaving a challenging question, that \textit{how can we detect such labeling bias?} In this work, we investigate whether influence functions can be used to detect labeling bias. Influence functions estimate how much each training sample affects a model's predictions by leveraging the gradient and Hessian of the loss function -- when labeling errors occur, influence functions can identify wrongly labeled samples in the training set, revealing the underlying failure mode. We develop a sample valuation pipeline and test it first on the MNIST dataset, then scaled to the more complex CheXpert medical imaging dataset. To examine label noise, we introduced controlled errors by flipping 20\% of the labels for one class in the dataset. Using a diagonal Hessian approximation, we demonstrated promising results, successfully detecting nearly 90\% of mislabeled samples in MNIST. On CheXpert, mislabeled samples consistently exhibit higher influence scores. These results highlight the potential of influence functions for identifying label errors.

Detecting labeling bias using influence functions

TL;DR

Promising results are demonstrated, successfully detecting nearly 90% of mislabeled samples in MNIST and on CheXpert, where mislabeled samples consistently exhibit higher influence scores, highlighting the potential of influence functions for identifying label errors.

Abstract

Labeling bias arises during data collection due to resource limitations or unconscious bias, leading to unequal label error rates across subgroups or misrepresentation of subgroup prevalence. Most fairness constraints assume training labels reflect the true distribution, rendering them ineffective when labeling bias is present; leaving a challenging question, that \textit{how can we detect such labeling bias?} In this work, we investigate whether influence functions can be used to detect labeling bias. Influence functions estimate how much each training sample affects a model's predictions by leveraging the gradient and Hessian of the loss function -- when labeling errors occur, influence functions can identify wrongly labeled samples in the training set, revealing the underlying failure mode. We develop a sample valuation pipeline and test it first on the MNIST dataset, then scaled to the more complex CheXpert medical imaging dataset. To examine label noise, we introduced controlled errors by flipping 20\% of the labels for one class in the dataset. Using a diagonal Hessian approximation, we demonstrated promising results, successfully detecting nearly 90\% of mislabeled samples in MNIST. On CheXpert, mislabeled samples consistently exhibit higher influence scores. These results highlight the potential of influence functions for identifying label errors.
Paper Structure (30 sections, 5 equations, 9 figures)

This paper contains 30 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: A misclassified test point with true label '4' and predicted label '2' along with the 10 most influential training points. The CNN Model is trained with the original MNIST.
  • Figure 2: Illustration of the sample valuation pipeline: Four steps are included: (1) A baseline model and a model trained on a manipulated dataset with flipped labels are trained and sensitivity to label noise is verified (2) a Hessian-based approximation is computed (3) influence scores determined to (4) evaluate the impact of training samples on model predictions. Note that the baseline model trained on the original dataset is used solely for comparison, while all subsequent sample valuation is performed using the model trained on the manipulated dataset.
  • Figure 3: Detection of mislabeled samples via influence scores. Left: average influence score for each training sample, where higher scores indicate greater contribution to test misclassifications. Right: percentage of artificially flipped labels recovered at different influence score thresholds.
  • Figure 4: Misclassified test sample
  • Figure 5: Top 10 most harmful training samples (highest positive influence)
  • ...and 4 more figures