Malign Overfitting: Interpolation Can Provably Preclude Invariance

Yoav Wald; Gal Yona; Uri Shalit; Yair Carmon

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

TL;DR

The paper proves a fundamental barrier: in a simple linear two-environment Gaussian mixture with core and spurious features, any interpolating classifier with nonzero margin cannot be invariant, leading to high robust error. In contrast, a two-phase, non-interpolating estimator can provably learn a nearly invariant classifier with low robust error by separating training and post-processing stages and enforcing an invariance constraint (e.g., Equal Opportunity). Through simulations and Waterbirds experiments, the authors show that interpolation undermines invariance, while the proposed two-phase approach yields reliable invariance and fairness gains across high-dimensional regimes. These results highlight a crucial limit of interpolation-based invariance methods and motivate finite-sample, two-stage strategies for robust, fair generalization in practice.

Abstract

Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of "benign overfitting", in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.

Malign Overfitting: Interpolation Can Provably Preclude Invariance

TL;DR

Abstract

Paper Structure (46 sections, 19 theorems, 118 equations, 5 figures, 1 algorithm)

This paper contains 46 sections, 19 theorems, 118 equations, 5 figures, 1 algorithm.

Introduction
Statement of Main Result
Preliminaries
Data model.
Robust performance metric.
Normalized margin.
Main Result
Interpolating Models Cannot Be Invariant
Implication for invariance-inducing algorithms.
A Provably Invariant Overparameterized Estimator
Empirical Validation
Simluations
Setup.
Evaluation and results.
Waterbirds Dataset
...and 31 more sections

Key Result

Theorem 1

For any sample sizes $N_1,N_2>65$, margin lower bound $\gamma \le \frac{1}{4\sqrt{N}}$, target robust error $\epsilon > 0$, and coefficients $\theta_1=1$, $\theta_2 > -\frac{N_1 \gamma}{\sqrt{288N_2}}$, there exist parameters $r_c,r_s>0$, $d>N$, and $\sigma>0$ such that the following holds for the L

Figures (5)

Figure 1: Numerical validation of our theoretical claims. Invariance inducing methods improve robust accuracy compared to ERM in low values of $d$, but their ability to do so is diminished as $d$ grows (top plot) and they enter the interpolation regime, as seen on the bottom plot for $d>10^2$. \ref{['alg:two_phase_learning']} learns robust predictors as $d$ grows and does not interpolate.
Figure 2: Results for the Waterbirds dataset sagawa2019distributionally. Top row: Train error (left) and test error (right). The train error is used to identify the interpolation threshold for the baseline method (approximately $d=1000$). Bottom row: Comparing the FNR gap on the test set (left), with zoomed-in versions on the right.
Figure 3: Example of datasets sampled from two training envrionments, where we set $\theta_1=1, \theta_2=0,$$N_1=800, N_2=100, r_s=2, r_c=1$. Left and right plots show projections of training points on ${\bm{\mu}}_c, {\bm{\mu}}_s\in{\mathbb{R}^d}$, and on ${\mathbf{u}}_1, {\mathbf{u}}_2$ drawn uniformly from the $d$-dimensional unit sphere, respectively. As we increase $d$ there are many hyperplanes ${\mathbf w}\in{\mathbb{R}^d}$ that separate the data, for some $\inp{{\mathbf w}}{\mu_c}$ is much higher than $\inp{{\mathbf w}}{\mu_s}$ (i.e. their predictions are invariant) and for some the opposite may hold. We ask whether interpolating learning rules can find the former.
Figure 4: Results of the simulation described in \ref{['sec:simualtion']} with $\theta_2=-\frac{1}{2}$ (all other parameters are kept at the same value as in \ref{['sec:simualtion']})
Figure 5: Simulation from \ref{['sec:validation']} with an added model trained after removing the spurious feature. This demonstrated the existence of an invariant interpolator, yet our theoretical results suggest that this type of model cannot be learned by an interpolating learning rule

Theorems & Definitions (44)

Definition 1
Definition 2: Linear Two Environment Problem
Definition 3: Robust error
Definition 4: Normalized margin
Theorem 1
Proposition 1
proof : Proof sketch
Proposition 2
proof : Proof sketch
Lemma 1
...and 34 more

Malign Overfitting: Interpolation Can Provably Preclude Invariance

TL;DR

Abstract

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (44)