Feature contamination: Neural networks learn uncorrelated features and fail to generalize

Tianren Zhang; Chujie Zhao; Guanyu Chen; Yizhou Jiang; Feng Chen

Feature contamination: Neural networks learn uncorrelated features and fail to generalize

Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen

TL;DR

This work identifies feature contamination as a fundamental inductive-bias phenomenon in SGD-trained nonlinear networks, showing that core predictive features can be learned together with uncorrelated background features under distribution shifts, leading to poor OOD generalization even when good representations are provided. The authors develop a structured two-layer ReLU model to prove activation asymmetry and subsequent feature contamination, and contrast this with linear networks which avoid contamination. Empirical evidence from representation distillation, Grad-CAM visualizations, and CIFAR-10-like tests corroborates the theory, illustrating real-world relevance. The results suggest that improving OOD robustness requires accounting for the optimization-induced coupling of features and hints that diversified pre-training might linearize such features, offering a direction for future work and algorithm design.

Abstract

Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.

Feature contamination: Neural networks learn uncorrelated features and fail to generalize

TL;DR

Abstract

Paper Structure (60 sections, 32 theorems, 144 equations, 15 figures, 5 tables)

This paper contains 60 sections, 32 theorems, 144 equations, 15 figures, 5 tables.

Introduction
Our Results and Implications
Good Representations Are Hard to Learn Even when Explicitly Given in Training
A Theoretical Model of OOD Generalization
OOD Generalization Problem Setup
Model and Training
Main Theoretical Results
Feature Contamination in Practice
Conclusion
Limitations, a Conjecture, and Future Work
Preliminaries
Notation.
Weight Decomposition and Gradient Calculations
Neuron Characterization
Intuition.
...and 45 more sections

Key Result

Theorem 4.1

For every $\eta \le \frac{1}{\mathsf{poly}(d_0)}$ and every $y\in\mathcal{Y}$, there exists $T_0 = \widetilde{\Theta}(\frac{m}{\eta\sqrt{d}})$ such that w.h.p., for every $t\ge T_0$, there exist $\Theta(m)$ neurons in which the weight ${\mathbf{w}}_k^{(t)}$ for each neuron satisfies:

Figures (15)

Figure 1: OOD performance ($y$-axes) v.s. ID performance ($x$-axes) for three model families including (i) linear probes on pre-trained representations (purple stars), (ii) linear probes on distilled representations (orange squares), and (iii) standard models trained on ID data (blue circles). The $y$-axis of the sixth panel stands for the average accuracy on ImageNet-based OOD test sets, averaged from the first five panels. Please refer to \ref{['appsec:distill']} for more details on each model family.
Figure 2: A diagram of feature contamination in our binary classification setting. Left: for models with non-linear activation functions such as ReLU, activation asymmetry leads to non-zero gradient projections onto background features. Right: for linear models, background features are cancelled out in the gradients, exhibiting no feature contamination.
Figure 3: Numerical results. (a)ID and OOD risks: During training, ID loss quickly approaches zero, while OOD loss stays high. (b)Activation asymmetry: the difference of average neuron activation rates for different classes largely increases during training. (c)Feature contamination: the average correlations between neuron weights and both core features and uncorrelated background features increase in training. (d) Feature contamination also occurs in more general settings with different activation functions. Please refer to \ref{['appsubsec:numerical']} for more details and results.
Figure 4: Class-averaged activation rate histograms of a randomly initialized CLIP-RN50 (left) and a distilled CLIP-RN50 (right). After training, more classes have smaller average activation rates close to zero and only a small number of classes have large average activation rates.
Figure 5: Average neuron selectivity of random and distilled CLIP-RN50 (left) and CLIP-ViT-B/16 (right) models. Distilled models have larger selectivity compared with random models and exhibit a selectivity drop in OOD data. Please refer to \ref{['appsec:selectivity']} for more details.
...and 10 more figures

Theorems & Definitions (60)

Definition 3.1: ID and OOD data generation
Theorem 4.1: Activation asymmetry
Theorem 4.2: Learned features
Theorem 4.3: ID and OOD risks
Theorem 4.4: Linear networks
Conjecture
Lemma 1.1: Gradient
proof
Lemma 1.2: Gap between empirical and population gradients
proof
...and 50 more

Feature contamination: Neural networks learn uncorrelated features and fail to generalize

TL;DR

Abstract

Feature contamination: Neural networks learn uncorrelated features and fail to generalize

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (60)