The Price of Robustness: Stable Classifiers Need Overparameterization

Jonas von Berg; Adalbert Fono; Massimiliano Datres; Sohir Maskey; Gitta Kutyniok

The Price of Robustness: Stable Classifiers Need Overparameterization

Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok

TL;DR

A generalization bound for finite function classes that improves inversely with class stability is established, defined as the expected distance to the decision boundary in the input domain (margin), derived as a corollary a law of robustness for classification.

Abstract

The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.

The Price of Robustness: Stable Classifiers Need Overparameterization

TL;DR

Abstract

parameters on

data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.

Paper Structure (36 sections, 9 theorems, 77 equations, 11 figures, 1 table)

This paper contains 36 sections, 9 theorems, 77 equations, 11 figures, 1 table.

Introduction
Paper Roadmap.
Contributions.
Related Work
Smoothness-based generalization.
Margin-based generalization.
Limits of uniform generalization bounds.
Stability, robustness, and implicit bias.
Out-of-Distribution Generalization.
Preliminaries and Notation
A Law of Robustness for Classification
A Law of Robustness for Infinite Function Classes
Experiments
Experimental setup.
Discussion and Future Work
...and 21 more sections

Key Result

Theorem 4

Suppose Assumptions (H1) and (H2) hold, and that $\min_{f \in \mathcal{F}} S(f) > S > 0$ with $\log |\mathcal{F}| \geq n$.

Figures (11)

Figure 1: 8-layer MLPs on CIFAR-10. Class stability $S(f)$ (left) and normalized co-stability $S^*(g)/L(g)$ (right) as a function of model size. The configuration with $w=128$ is excluded since it failed to attain 99% training accuracy after 400 epochs.
Figure 2: Class stability $S(f)$ for 4- and 8-layer CNNs trained on CIFAR-10.
Figure 3: Class stability $S(f)$ for 4-layer and 8-layer Heaviside-activation MLPs trained on MNIST.
Figure 4: Class stability for 4-layer MLPs trained on Gaussian toy-data with different variances.
Figure 5: 4-layer MLPs on CIFAR10.
...and 6 more figures

Theorems & Definitions (31)

Definition 1: Margin and Class Stability
Remark 2
Definition 3: Isoperimetry
Theorem 4: Rademacher Bound
proof : Proof sketch
Remark 5
Corollary 6: Law of Robustness for Discontinuous Functions
proof : Proof sketch
Remark 7
Remark 8
...and 21 more

The Price of Robustness: Stable Classifiers Need Overparameterization

TL;DR

Abstract

The Price of Robustness: Stable Classifiers Need Overparameterization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (31)