Feature contamination: Neural networks learn uncorrelated features and fail to generalize
Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen
TL;DR
This work identifies feature contamination as a fundamental inductive-bias phenomenon in SGD-trained nonlinear networks, showing that core predictive features can be learned together with uncorrelated background features under distribution shifts, leading to poor OOD generalization even when good representations are provided. The authors develop a structured two-layer ReLU model to prove activation asymmetry and subsequent feature contamination, and contrast this with linear networks which avoid contamination. Empirical evidence from representation distillation, Grad-CAM visualizations, and CIFAR-10-like tests corroborates the theory, illustrating real-world relevance. The results suggest that improving OOD robustness requires accounting for the optimization-induced coupling of features and hints that diversified pre-training might linearize such features, offering a direction for future work and algorithm design.
Abstract
Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.
