Table of Contents
Fetching ...

On Achieving Optimal Adversarial Test Error

Justin D. Li, Matus Telgarsky

TL;DR

The paper tackles the problem of achieving optimal adversarial test error by developing a theory that connects adversarial convex losses to adversarial zero-one losses via thresholding and by analyzing near-initialization (NTK) regimes. It shows that, under an idealized optimal adversary and with early stopping, shallow ReLU networks can train to achieve adversarial performance arbitrarily close to the optimum for general data distributions and perturbation sets, aided by new near-initialization generalization bounds built on Rademacher complexity. The authors derive a robust set of results: (i) a structural link between convex and zero-one adversarial losses, (ii) calibration of convex losses under thresholding, (iii) an adversarial training bound that ties training error to a reference model, and (iv) a generalization bound that scales with distance from initialization and perturbation size. Together, these findings extend prior theory beyond linear or specially structured data and provide a pathway toward provable adversarial test error guarantees in a practical, near-initialization setting with early stopping.

Abstract

We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. By contrast, prior theoretical work either considered specialized data distributions or only provided training error guarantees.

On Achieving Optimal Adversarial Test Error

TL;DR

The paper tackles the problem of achieving optimal adversarial test error by developing a theory that connects adversarial convex losses to adversarial zero-one losses via thresholding and by analyzing near-initialization (NTK) regimes. It shows that, under an idealized optimal adversary and with early stopping, shallow ReLU networks can train to achieve adversarial performance arbitrarily close to the optimum for general data distributions and perturbation sets, aided by new near-initialization generalization bounds built on Rademacher complexity. The authors derive a robust set of results: (i) a structural link between convex and zero-one adversarial losses, (ii) calibration of convex losses under thresholding, (iii) an adversarial training bound that ties training error to a reference model, and (iv) a generalization bound that scales with distance from initialization and perturbation size. Together, these findings extend prior theory beyond linear or specially structured data and provide a pathway toward provable adversarial test error guarantees in a practical, near-initialization setting with early stopping.

Abstract

We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. By contrast, prior theoretical work either considered specialized data distributions or only provided training error guarantees.
Paper Structure (29 sections, 23 theorems, 98 equations, 1 figure)

This paper contains 29 sections, 23 theorems, 98 equations, 1 figure.

Key Result

Lemma 3.1

For any predictor $f$, $\mathcal{R}_{\textup{A}}(f) = \int_{-\infty}^\infty \mathcal{R}_{\textup{AZ}}^t(f-t) \dif t$.

Figures (1)

  • Figure 1: A plot of the (robust/standard) zero-one (training/test) loss throughout training for an adversarially trained network, using code due to rice-wong-kolter, with a constant step size of $0.01$. The present work is set within the early phase of training, where we can get arbitrarily close to the optimal adversarial test error. In fact, our analysis will be further restricted to an even earlier portion of this phase, as we remain within the near-initialization/NTK regime. As noted in prior work, adversarial training, as compared with standard training, seems to have more fragile test-time performance and quickly enters a phase of severe overfitting, but we do not consider this issue here.

Theorems & Definitions (45)

  • Lemma 3.1
  • Lemma 3.2
  • Theorem 3.3
  • Lemma 3.4
  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Lemma 4.4
  • Lemma 4.5
  • Lemma 4.6
  • ...and 35 more