Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Humza Wajid Hameed, Geraldin Nanfack, Eugene Belilovsky
TL;DR
The paper tackles spurious correlations that degrade worst-group performance in ERM-trained models. It introduces H2T-DFR, a three-stage approach that uses Head2Toe to select transferable features from all network layers and then applies Deep Feature Reweighting on a balanced validation set to emphasize robust features. Empirical results on CelebA, Waterbirds, and HAM10000 (with a ResNet-50 backbone) show improvements in worst-group accuracy for CelebA (~2.6%) and HAM10000 (~2.4%), while Waterbirds remains largely unchanged, with mean group accuracy staying comparable. The findings suggest that integrating multi-layer feature selection with balanced-group retraining can meaningfully boost robustness to spurious correlations in real-world benchmarks.
Abstract
Spurious correlations are a major source of errors for machine learning models, in particular when aiming for group-level fairness. It has been recently shown that a powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset, isolating robust features for the predictor. However, key attributes can sometimes be discarded by neural networks towards the last layer. In this work, we thus consider retraining a classifier on a set of features derived from all layers. We utilize a recently proposed feature selection strategy to select unbiased features from all the layers. We observe this approach gives significant improvements in worst-group accuracy on several standard benchmarks.
