How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

Simone Bombari; Marco Mondelli

How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

Simone Bombari, Marco Mondelli

TL;DR

The paper develops a quantitative framework for memorization of spurious features in overparameterized models by combining model stability with a novel feature-alignment quantity $\mathcal{F}_{\varphi}(z^s,z)$. It provides precise, concentration-based results for two canonical regimes, RF and NTK, showing that the feature alignment concentrates to positive constants $\gamma_{\mathrm{RF}}$ and $\gamma_{\mathrm{NTK}}$, which depend on the spurious feature fraction $\alpha$ and the Hermite coefficients of the activation (and its derivative for NTK). The key finding is that memorization scales with the model's generalization error, and the amount of memorization can be controlled by choosing activations with favorable Hermite-spectrum (reducing high-order content). The authors validate the theory on MNIST/CIFAR-10 and across neural-architectural variants, illustrating practical implications for mitigating memorization by activation design and data considerations.

Abstract

Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: (i) the stability of the model with respect to individual training samples, and (ii) the feature alignment between the spurious feature and the full sample. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result gives a precise characterization of the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. We prove that the memorization of spurious features weakens as the generalization capability increases and, through the analysis of the feature alignment, we unveil the role of the model and of its activation function. Numerical experiments show the predictive power of our theory on standard datasets (MNIST, CIFAR-10).

How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

TL;DR

The paper develops a quantitative framework for memorization of spurious features in overparameterized models by combining model stability with a novel feature-alignment quantity

. It provides precise, concentration-based results for two canonical regimes, RF and NTK, showing that the feature alignment concentrates to positive constants

and

, which depend on the spurious feature fraction

and the Hermite coefficients of the activation (and its derivative for NTK). The key finding is that memorization scales with the model's generalization error, and the amount of memorization can be controlled by choosing activations with favorable Hermite-spectrum (reducing high-order content). The authors validate the theory on MNIST/CIFAR-10 and across neural-architectural variants, illustrating practical implications for mitigating memorization by activation design and data considerations.

Abstract

Paper Structure (24 sections, 27 theorems, 171 equations, 7 figures)

This paper contains 24 sections, 27 theorems, 171 equations, 7 figures.

Introduction
Related work
Spurious features.
Memorization and stability.
Random features and neural tangent kernel.
Preliminaries
Notation.
Setting.
Stability.
Memorization of spurious features.
Memorization and feature alignment
Main result for random features
Proof sketch.
Main result for NTK features
Proof sketch.
...and 9 more sections

Key Result

Lemma 4.1

Let $\varphi : \mathbb{R}^d \to \mathbb{R}^p$ be a feature map, such that the induced kernel $K \in \mathbb{R}^{N \times N}$ on the training set is invertible. Let $z_1 \in \mathbb{R}^d$ be an element of the training dataset $Z$, and $z \in \mathbb{R}^d$ a generic test sample. Let $P_{\Phi_{-1}}$ be the feature alignment between $z$ and $z_1$. Then, we have

Figures (7)

Figure 1: Example of a training sample $z$ (top-left) and its spurious counterpart $z^s$ (top-right). In experiments, we add a noise background ($y$) around the original images ($x$) before training (bottom-left). We then query the trained model only with the noise component (bottom-right).
Figure 2: Test and spurious accuracies as a function of the number of training samples $N$, for various binary classification tasks. In the first two plots, we consider the RF model in \ref{['eq:featmaprf']} with $k = 10^5$ trained over Gaussian data with $d = 1000$. The labeling function is $g(x) = \textup{sign}(u^\top x)$. We repeat the experiments for $\alpha = \{ 0.25, 0.5 \}$ and for the two activations $\phi_2 = h_1 + h_2$ and $\phi_4 = h_1 + h_4$, where $h_i$ denotes the $i$-th Hermite polynomial (see Appendix \ref{['app:Hermite']}). In the last two plots, we consider the same model with ReLU activation, trained over two MNIST and CIFAR-10 classes. The width of the noise background is $10$ pixels for MNIST and $8$ pixels for CIFAR-10, see Figure \ref{['fig:cat']}. The spurious accuracy is obtained by querying the model only with the noise background from the training set, replacing all the other pixels with $0$, and taking the sign of the output. As we consider binary classification, an accuracy of 0.5 is achieved by random guessing. We plot the average over 10 independent trials and the confidence band at 1 standard deviation.
Figure 3: We consider the NTK model in \ref{['eq:NTKmodel']} with $k = 100$, trained on MNIST (digits 1 and 7, first and second plots), and CIFAR-10 (cats and ships, third and fourth plots). We repeat the experiments for activations whose derivatives are $\phi'_2 = h_0 + h_1$ and $\phi'_8 = h_0 + h_7$, where $h_i$ denotes the $i$-th Hermite polynomial (see Appendix \ref{['app:Hermite']}). The rest of the setup is the same as that of Figure \ref{['fig:rf']}.
Figure 4: Test and spurious accuracies as a function of the number of training samples $N$, for a fully connected (FC, first two plots), and a small convolutional neural network (CNN, last two plots). In the first plot, we use synthetic (Gaussian) data with $d = 1000$, and the labeling function is $g(x) = \textup{sign}(u^\top x)$. As we consider binary classification, the accuracy of random guessing is $0.5$. The other plots use subsets of the MNIST and CIFAR-10 datasets, with an external layer of noise added to images, see Figure \ref{['fig:cat']}. As we consider $10$ classes, the accuracy of random guessing is $0.1$. We plot the average over 10 independent trials and the confidence band at 1 standard deviation.
Figure 5: Test and spurious accuracies as a function of the number of training samples $N$, for two ResNet architectures. We use subsets of the CIFAR-10 dataset, with an external layer of noise added to images, see Figure \ref{['fig:cat']}. As we consider $10$ classes, the accuracy of random guessing is $0.1$. We plot the average over 10 independent trials and the confidence band at 1 standard deviation.
...and 2 more figures

Theorems & Definitions (52)

Definition 3.1
Lemma 4.1
Theorem 1
Theorem 2
Proposition A.1: Proposition 11.31, booleananalysis
Proposition A.2: Definition 11.34, booleananalysis
Proposition A.3
Lemma B.1
proof
proof : Proof of Lemma \ref{['lemma:proj']}
...and 42 more

How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

TL;DR

Abstract

How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (52)