On the Unreasonable Effectiveness of Last-layer Retraining
John C. Hill, Tyler LaBonte, Xinchen Zhang, Vidya Muthukumar
TL;DR
ERM often leverages spurious correlations that harm minority-group performance. This work investigates last-layer retraining (LLR) as an efficient remedy, testing whether neural collapse or implicit bias explains its effectiveness. Across four benchmarks, neural collapse did not consistently occur during ERM, and LLR’s gains correlate strongly with improved group balance in the held-out data rather than margin-based dynamics. The findings show that CB-LLR and AFR achieve robust worst-group performance primarily by implicit or explicit group-balancing, guiding practical use of LLR when group annotations are limited and highlighting the importance of data balance in held-out sets.
Abstract
Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an efficient approach to rectify dependence on spurious correlations and improve performance on minority groups. Surprisingly, LLR has been found to improve worst-group accuracy even when the held-out set is an imbalanced subset of the training set. We initially hypothesize that this ``unreasonable effectiveness'' of LLR is explained by its ability to mitigate neural collapse through the held-out set, resulting in the implicit bias of gradient descent benefiting robustness. Our empirical investigation does not support this hypothesis. Instead, we present strong evidence for an alternative hypothesis: that the success of LLR is primarily due to better group balance in the held-out set. We conclude by showing how the recent algorithms CB-LLR and AFR perform implicit group-balancing to elicit a robustness improvement.
