Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

Phuong Quynh Le; Jörg Schlötterer; Christin Seifert

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

TL;DR

The paper addresses the vulnerability of ERM models to spurious correlations, particularly affecting worst-group accuracy in high-stakes domains. It evaluates Deep Feature Reweighting (DFR), which retrains only the last layer using a small, group-balanced subset on top of a fixed encoder $f_{enc}$, implemented with a ResNet-50 backbone. Results show substantial improvements in worst-group accuracy (e.g., Waterbirds from $72.86\%$ to $92.55\%$, ISIC malignant w/o patch from $64.38\%$ to $85.84\%$), but some non-spurious groups and overall accuracy can degrade, especially in the ISIC dataset. Qualitative analyses reveal that many last-layer weights become zero (high sparsity) and that while DFR reduces reliance on spurious cues, residual spurious correlations remain, indicating the need for more robust methods and integration of domain knowledge in medical contexts.

Abstract

Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

TL;DR

, implemented with a ResNet-50 backbone. Results show substantial improvements in worst-group accuracy (e.g., Waterbirds from

, ISIC malignant w/o patch from

), but some non-spurious groups and overall accuracy can degrade, especially in the ISIC dataset. Qualitative analyses reveal that many last-layer weights become zero (high sparsity) and that while DFR reduces reliance on spurious cues, residual spurious correlations remain, indicating the need for more robust methods and integration of domain knowledge in medical contexts.

Abstract

Paper Structure (12 sections, 3 figures, 2 tables)

This paper contains 12 sections, 3 figures, 2 tables.

Introduction
Methodology
Data Sets
Waterbirds.
ISIC Skin.
Results
DFR Performance
Analysis of Last Layer Weights
Qualitative Analyses
Image-level.
Neuron-level.
Conclusion

Figures (3)

Figure 1: Last layer weights heatmap of ERM and DFR models (the weight vector is reshaped to two dimensions in visualisation).
Figure 2: CAM visualisation for three exemplary test images. Showing original image (left), CAM before retraining (center), and CAM after retraining. Red indicates high activation, blue low activation.
Figure 3: Neuron-level visualisations. Showing two example for each group and activations for three exemplary neurons from the last layer, one neuron that encodes spurious features, one neuron that encodes core features and one neuron that encodes both, core and spurious features.

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

TL;DR

Abstract

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

Authors

TL;DR

Abstract

Table of Contents

Figures (3)