Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Himadri Mandal; Vishnu Varadarajan; Jaee Ponde; Aritra Das; Mihir More; Debayan Gupta

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta

TL;DR

This work introduces a nontrivial notion of robust generalization error and converts it into a lower bound on the expected Rademacher complexity of the induced robust loss class and finds that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023).

Abstract

Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the $Ω(n^{1/d})$ regime of Wu et al. (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al. (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

TL;DR

Abstract

regime of Wu et al. (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al. (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.

Paper Structure (22 sections, 10 theorems, 50 equations, 3 figures)

This paper contains 22 sections, 10 theorems, 50 equations, 3 figures.

Introduction
Related Work
Adversarial robustness, adversarial training, and Lipschitz
Overparameterization and laws of robustness
Robust generalization and robust overfitting
Lower Bound on Robust Generalization Gap
Local extremal envelopes
Rademacher complexity for loss vectors
The contraction lemma
Coordinatewise maxima: Rad(A∨B) ≤ Rad(A)+Rad(B)
Application: robust squared loss
Empirical Scaling of Data-Dependent Lipschitz Lower Bounds
Dataset Choice and Effective Dimension
Varying Parameter Count and Dataset Size
Training Until Overfitting
...and 7 more sections

Key Result

Lemma 1

Let $f:\mathbb{R}^d\to\mathbb{R}$ be $L$-Lipschitz with respect to a norm $\|\cdot\|$, i.e., Fix $\rho>0$ and training samples $\{(x_i,y_i)\}_{i=1}^n$. Define the clean empirical (train) error and the robust empirical (train) error Then

Figures (3)

Figure 1: Loss curves of models chosen at random (the other loss curves can be found in our codebase).
Figure 2: Lipschitz Growth vs Dataset Size (Plots chosen at random, others can be found in our codebase).
Figure 3: Lipschitz Growth vs Model Parameters (Plots chosen at random, others can be found in our codebase).

Theorems & Definitions (26)

Lemma 1: Robust--clean empirical gap
proof
Definition 1: Local extremal envelopes
Lemma 2: Envelopes preserve Lipschitzness
proof
Definition 2: Robust squared loss
Lemma 3: Worst-case squared deviation is at an endpoint
proof
Definition 3: Empirical Rademacher complexity of a set of vectors
Remark 1
...and 16 more

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

TL;DR

Abstract

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (26)