Table of Contents
Fetching ...

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta

TL;DR

This work introduces a nontrivial notion of robust generalization error and converts it into a lower bound on the expected Rademacher complexity of the induced robust loss class and finds that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023).

Abstract

Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the $Ω(n^{1/d})$ regime of Wu et al. (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al. (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

TL;DR

This work introduces a nontrivial notion of robust generalization error and converts it into a lower bound on the expected Rademacher complexity of the induced robust loss class and finds that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023).

Abstract

Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the regime of Wu et al. (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al. (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al. (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.
Paper Structure (22 sections, 10 theorems, 50 equations, 3 figures)

This paper contains 22 sections, 10 theorems, 50 equations, 3 figures.

Key Result

Lemma 1

Let $f:\mathbb{R}^d\to\mathbb{R}$ be $L$-Lipschitz with respect to a norm $\|\cdot\|$, i.e., Fix $\rho>0$ and training samples $\{(x_i,y_i)\}_{i=1}^n$. Define the clean empirical (train) error and the robust empirical (train) error Then

Figures (3)

  • Figure 1: Loss curves of models chosen at random (the other loss curves can be found in our codebase).
  • Figure 2: Lipschitz Growth vs Dataset Size (Plots chosen at random, others can be found in our codebase).
  • Figure 3: Lipschitz Growth vs Model Parameters (Plots chosen at random, others can be found in our codebase).

Theorems & Definitions (26)

  • Lemma 1: Robust--clean empirical gap
  • proof
  • Definition 1: Local extremal envelopes
  • Lemma 2: Envelopes preserve Lipschitzness
  • proof
  • Definition 2: Robust squared loss
  • Lemma 3: Worst-case squared deviation is at an endpoint
  • proof
  • Definition 3: Empirical Rademacher complexity of a set of vectors
  • Remark 1
  • ...and 16 more