Table of Contents
Fetching ...

Severing Spurious Correlations with Data Pruning

Varun Mulchandani, Jung-Eun Kim

TL;DR

This work tackles the problem of spurious correlations in deep neural networks when spurious signals are weak and hard to identify. It shows that a small subset of hard-core samples primarily drives spuriosity and introduces a straightforward data-pruning approach that removes these samples without relying on domain knowledge or sample-level annotations. The method achieves state-of-the-art robustness on benchmarks where spuriosity is identifiable (e.g., Waterbirds, MultiNLI) and unidentifiable (e.g., CelebA, CIFAR-10S), often with minimal or no hyperparameter tuning. The findings suggest that carefully pruning a tiny fraction of training data can substantially reduce reliance on spurious correlations, offering a practical path toward more reliable deployment under distribution shifts.

Abstract

Deep neural networks have been shown to learn and rely on spurious correlations present in the data that they are trained on. Reliance on such correlations can cause these networks to malfunction when deployed in the real world, where these correlations may no longer hold. To overcome the learning of and reliance on such correlations, recent studies propose approaches that yield promising results. These works, however, study settings where the strength of the spurious signal is significantly greater than that of the core, invariant signal, making it easier to detect the presence of spurious features in individual training samples and allow for further processing. In this paper, we identify new settings where the strength of the spurious signal is relatively weaker, making it difficult to detect any spurious information while continuing to have catastrophic consequences. We also discover that spurious correlations are learned primarily due to only a handful of all the samples containing the spurious feature and develop a novel data pruning technique that identifies and prunes small subsets of the training data that contain these samples. Our proposed technique does not require inferred domain knowledge, information regarding the sample-wise presence or nature of spurious information, or human intervention. Finally, we show that such data pruning attains state-of-the-art performance on previously studied settings where spurious information is identifiable.

Severing Spurious Correlations with Data Pruning

TL;DR

This work tackles the problem of spurious correlations in deep neural networks when spurious signals are weak and hard to identify. It shows that a small subset of hard-core samples primarily drives spuriosity and introduces a straightforward data-pruning approach that removes these samples without relying on domain knowledge or sample-level annotations. The method achieves state-of-the-art robustness on benchmarks where spuriosity is identifiable (e.g., Waterbirds, MultiNLI) and unidentifiable (e.g., CelebA, CIFAR-10S), often with minimal or no hyperparameter tuning. The findings suggest that carefully pruning a tiny fraction of training data can substantially reduce reliance on spurious correlations, offering a practical path toward more reliable deployment under distribution shifts.

Abstract

Deep neural networks have been shown to learn and rely on spurious correlations present in the data that they are trained on. Reliance on such correlations can cause these networks to malfunction when deployed in the real world, where these correlations may no longer hold. To overcome the learning of and reliance on such correlations, recent studies propose approaches that yield promising results. These works, however, study settings where the strength of the spurious signal is significantly greater than that of the core, invariant signal, making it easier to detect the presence of spurious features in individual training samples and allow for further processing. In this paper, we identify new settings where the strength of the spurious signal is relatively weaker, making it difficult to detect any spurious information while continuing to have catastrophic consequences. We also discover that spurious correlations are learned primarily due to only a handful of all the samples containing the spurious feature and develop a novel data pruning technique that identifies and prunes small subsets of the training data that contain these samples. Our proposed technique does not require inferred domain knowledge, information regarding the sample-wise presence or nature of spurious information, or human intervention. Finally, we show that such data pruning attains state-of-the-art performance on previously studied settings where spurious information is identifiable.

Paper Structure

This paper contains 15 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Spurious information is often unattainable.
  • Figure 2: Introducing spurious features in 100 samples with the easiest core features (Easiest Core) causes little to no reliance on spurious features, indicated by low Spurious Misclassifications. Introducing the same spurious features in 100 samples with the hardest core features (Hardest Core) causes heavy reliance on spurious features, indicated by high Spurious Misclassifications.
  • Figure 3: Spurious feature reliance exhibits super-linear growth with increasing sample difficulty.
  • Figure 4: Excluding only a handful of training samples with spurious features and hard core features mitigates spurious correlations in the CelebA setting. This is indicated by high Worst Group Accuracies. Excluding up to 97% of all training samples with spurious features and easy core features shows no improvements in worst group accuracy.
  • Figure 5: Impact of strength of spurious signal on sample difficulty.
  • ...and 6 more figures