Table of Contents
Fetching ...

HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlations

Aryan Yazdan Parast, Khawar Islam, Soyoun Won, Basim Azam, Naveed Akhtar

Abstract

Deep neural networks often rely on spurious features to make predictions, which makes them brittle under distribution shift and on samples where the spurious correlation does not hold (e.g., minority-group examples). Recent studies have shown that, even in such settings, the feature extractor of an Empirical Risk Minimization (ERM)-trained model can learn rich and informative representations, and that much of the failure may be attributed to the classifier head. In particular, retraining a lightweight head while keeping the backbone frozen can substantially improve performance on shifted distributions and minority groups. Motivated by this observation, we propose a bilevel meta-learning method that performs augmentation directly in feature space to improve spurious correlation handling in the classifier head. Our method learns support-side feature edits such that, after a small number of inner-loop updates on the edited features, the classifier achieves lower loss on hard examples and improved worst-group performance. By operating at the backbone output rather than in pixel space or through end-to-end optimization, the method is highly efficient and stable, requiring only a few minutes of training on a single GPU. We further validate our method with CLIP-based visualizations, showing that the learned feature-space updates induce semantically meaningful shifts aligned with spurious attributes.

HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlations

Abstract

Deep neural networks often rely on spurious features to make predictions, which makes them brittle under distribution shift and on samples where the spurious correlation does not hold (e.g., minority-group examples). Recent studies have shown that, even in such settings, the feature extractor of an Empirical Risk Minimization (ERM)-trained model can learn rich and informative representations, and that much of the failure may be attributed to the classifier head. In particular, retraining a lightweight head while keeping the backbone frozen can substantially improve performance on shifted distributions and minority groups. Motivated by this observation, we propose a bilevel meta-learning method that performs augmentation directly in feature space to improve spurious correlation handling in the classifier head. Our method learns support-side feature edits such that, after a small number of inner-loop updates on the edited features, the classifier achieves lower loss on hard examples and improved worst-group performance. By operating at the backbone output rather than in pixel space or through end-to-end optimization, the method is highly efficient and stable, requiring only a few minutes of training on a single GPU. We further validate our method with CLIP-based visualizations, showing that the learned feature-space updates induce semantically meaningful shifts aligned with spurious attributes.

Paper Structure

This paper contains 36 sections, 17 equations, 5 figures, 14 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of HSFM.(a) Initial training samples are passed through a frozen ERM feature extractor to obtain the support embeddings. The linear head is first adapted on the support set through the inner loss, and the outer loss is then computed on the hard set to improve performance on difficult examples. (b) Flowchart of the training procedure
  • Figure 2: Visualization of three variants for each sample: original image, the image generated by SD unCLIP from the initial embedding, and the image generated from the optimized embedding. In CelebA, the first two rows show non-blond males shifted toward non-blond females. The last two rows show the shift from blond females to blond males. In Waterbirds, the first two rows show landbirds shifted from land backgrounds to water backgrounds, while the last two rows show waterbirds shifted from water backgrounds to land backgrounds.
  • Figure 3: Worst-group accuracy (WGA) under different values of $T$ on the Waterbirds dataset.
  • Figure 4: Worst-group accuracy (WGA) under different support set sizes on the CelebA dataset.
  • Figure 5: DDB generated samples on Dominoes dataset