Table of Contents
Fetching ...

Out of spuriousity: Improving robustness to spurious correlations without group annotations

Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

TL;DR

This work tackles the problem of spurious correlations causing poor generalization by proposing PruSC, a post-training approach that extracts a spurious-free subnetwork from a fully trained model. It relies on clustering in the model's representation space to identify and distort spurious-feature manifolds via a task-oriented contrastive loss, without requiring group annotations. The method constructs a class-balanced de-biasing dataset D_task through unsupervised clustering and optimizes a constrained subnetwork with L = L_mod + βL_task, followed by lightweight fine-tuning. Empirically, PruSC achieves strong worst-group accuracy on CelebA and ISIC, is competitive with annotated baselines, and demonstrates robustness to multiple spurious attributes, all while avoiding explicit spurious-feature labels. This presents a practical, annotation-free path to robust generalization via subnetworks that rely on invariant features.

Abstract

Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels.

Out of spuriousity: Improving robustness to spurious correlations without group annotations

TL;DR

This work tackles the problem of spurious correlations causing poor generalization by proposing PruSC, a post-training approach that extracts a spurious-free subnetwork from a fully trained model. It relies on clustering in the model's representation space to identify and distort spurious-feature manifolds via a task-oriented contrastive loss, without requiring group annotations. The method constructs a class-balanced de-biasing dataset D_task through unsupervised clustering and optimizes a constrained subnetwork with L = L_mod + βL_task, followed by lightweight fine-tuning. Empirically, PruSC achieves strong worst-group accuracy on CelebA and ISIC, is competitive with annotated baselines, and demonstrates robustness to multiple spurious attributes, all while avoiding explicit spurious-feature labels. This presents a practical, annotation-free path to robust generalization via subnetworks that rely on invariant features.

Abstract

Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels.
Paper Structure (46 sections, 8 equations, 6 figures, 9 tables)

This paper contains 46 sections, 8 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Simple network on the two moons dataset. Decision boundaries (black lines) depend on model capacity and loss. Left: standard ERM with cross-entropy loss with a nearly linear decision boundary and a strong dependence on spurious feature (x-coordinate). Center: Pruned network (masking) using only 50% of the weights at test time shows less dependency on the spurious feature. Right: Masking, and using our contrastive loss (cf. Sec. \ref{['sec:approach:task:conloss']}) results in non-linear decision boundaries with large margins.
  • Figure 2: Embedding space (t-SNE) of ERM on CelebA for predicting hair color. Colors represent class labels (left), attributes (center), and k-means cluster labels (right). Spurious attributes are strongly correlated with class labels (e.g., female -- blond hair) and sub-manifolds are defined by spurious attributes within a class (non-blond, beard and young). k-means tends to cluster based on attributes with high purity.
  • Figure 3: Assumption and overall idea. A. Instances from the same class (same colour) lie in different clusters apart in feature space. Instances with the spurious feature (solid border) are nearby in feature space, i.e., the spurious feature induces clusters and primarily defines the shape of the data manifold. B. Contrastive loss: For an anchor $x$, instances from the same class but different clusters constitute positive samples $x^+$, instances from the same cluster negative samples $x^-$. Positive samples are moved closer to the anchor, negative samples away from the anchor. In each iteration multiple samples act as anchors (with their positive and negative samples). C. Goal and effect of our contrastive loss. The spurious feature does not define data manifolds and cannot be used as discriminative feature between classes. Note: Our approach does not assume (and need) knowledge about the spurious feature.
  • Figure 4: Worst-group accuracy (WGA) and average accuracy (AVG) of CelebA according to pruning ratio.
  • Figure 5: The test set embedding space of ERM (left), DCWP (center) and PruSC (right) on CelebA for predicting blond hair color. We visualize the class of non-blond hair with colors representing three spurious attributes: female, young, and beard. All models achieve high average test accuracy (over 88%, see Tab. \ref{['tab:main-result']}). However, while ERM and DCWP cluster samples by spurious attributes, our PruSC mixes samples with these attributes, resulting in better worst-group accuracy.
  • ...and 1 more figures