Table of Contents
Fetching ...

The Price of Robustness: Stable Classifiers Need Overparameterization

Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok

TL;DR

A generalization bound for finite function classes that improves inversely with class stability is established, defined as the expected distance to the decision boundary in the input domain (margin), derived as a corollary a law of robustness for classification.

Abstract

The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.

The Price of Robustness: Stable Classifiers Need Overparameterization

TL;DR

A generalization bound for finite function classes that improves inversely with class stability is established, defined as the expected distance to the decision boundary in the input domain (margin), derived as a corollary a law of robustness for classification.

Abstract

The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with parameters on data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.
Paper Structure (36 sections, 9 theorems, 77 equations, 11 figures, 1 table)

This paper contains 36 sections, 9 theorems, 77 equations, 11 figures, 1 table.

Key Result

Theorem 4

Suppose Assumptions (H1) and (H2) hold, and that $\min_{f \in \mathcal{F}} S(f) > S > 0$ with $\log |\mathcal{F}| \geq n$.

Figures (11)

  • Figure 1: 8-layer MLPs on CIFAR-10. Class stability $S(f)$ (left) and normalized co-stability $S^*(g)/L(g)$ (right) as a function of model size. The configuration with $w=128$ is excluded since it failed to attain 99% training accuracy after 400 epochs.
  • Figure 2: Class stability $S(f)$ for 4- and 8-layer CNNs trained on CIFAR-10.
  • Figure 3: Class stability $S(f)$ for 4-layer and 8-layer Heaviside-activation MLPs trained on MNIST.
  • Figure 4: Class stability for 4-layer MLPs trained on Gaussian toy-data with different variances.
  • Figure 5: 4-layer MLPs on CIFAR10.
  • ...and 6 more figures

Theorems & Definitions (31)

  • Definition 1: Margin and Class Stability
  • Remark 2
  • Definition 3: Isoperimetry
  • Theorem 4: Rademacher Bound
  • proof : Proof sketch
  • Remark 5
  • Corollary 6: Law of Robustness for Discontinuous Functions
  • proof : Proof sketch
  • Remark 7
  • Remark 8
  • ...and 21 more