Table of Contents
Fetching ...

On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics

Binghui Li, Yuanzhi Li

TL;DR

This work investigates why adversarial training often yields good clean generalization while suffering from robust overfitting. It develops two complementary theories: (i) a representation-logic view showing CGRO can be achieved with only polynomial overhead while robust generalization requires exponential capacity, and (ii) a training-dynamics view in structured data illustrating a three-stage learning process where the model partially learns the true feature yet memorizes training perturbations, yielding CGRO. Theoretical results are supported by experiments on real datasets (MNIST/CIFAR10) and synthetic data, corroborating the three-stage dynamics and the capacity gap between CGRO and robust classification. Overall, the paper highlights robust memorization as a driver of CGRO and provides a framework for understanding robustness limits in overparameterized networks. The findings have implications for designing training regimes and architectures that balance clean performance with robust generalization.

Abstract

Similar to surprising performance in the standard deep learning, deep nets trained by adversarial training also generalize well for unseen clean data (natural data). However, despite adversarial training can achieve low robust training error, there exists a significant robust generalization gap. We call this phenomenon the Clean Generalization and Robust Overfitting (CGRO). In this work, we study the CGRO phenomenon in adversarial training from two views: representation complexity and training dynamics. Specifically, we consider a binary classification setting with $N$ separated training data points. First, we prove that, based on the assumption that we assume there is $\operatorname{poly}(D)$-size clean classifier (where $D$ is the data dimension), ReLU net with only $O(N D)$ extra parameters is able to leverages robust memorization to achieve the CGRO, while robust classifier still requires exponential representation complexity in worst case. Next, we focus on a structured-data case to analyze training dynamics, where we train a two-layer convolutional network with $O(N D)$ width against adversarial perturbation. We then show that a three-stage phase transition occurs during learning process and the network provably converges to robust memorization regime, which thereby results in the CGRO. Besides, we also empirically verify our theoretical analysis by experiments in real-image recognition datasets.

On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics

TL;DR

This work investigates why adversarial training often yields good clean generalization while suffering from robust overfitting. It develops two complementary theories: (i) a representation-logic view showing CGRO can be achieved with only polynomial overhead while robust generalization requires exponential capacity, and (ii) a training-dynamics view in structured data illustrating a three-stage learning process where the model partially learns the true feature yet memorizes training perturbations, yielding CGRO. Theoretical results are supported by experiments on real datasets (MNIST/CIFAR10) and synthetic data, corroborating the three-stage dynamics and the capacity gap between CGRO and robust classification. Overall, the paper highlights robust memorization as a driver of CGRO and provides a framework for understanding robustness limits in overparameterized networks. The findings have implications for designing training regimes and architectures that balance clean performance with robust generalization.

Abstract

Similar to surprising performance in the standard deep learning, deep nets trained by adversarial training also generalize well for unseen clean data (natural data). However, despite adversarial training can achieve low robust training error, there exists a significant robust generalization gap. We call this phenomenon the Clean Generalization and Robust Overfitting (CGRO). In this work, we study the CGRO phenomenon in adversarial training from two views: representation complexity and training dynamics. Specifically, we consider a binary classification setting with separated training data points. First, we prove that, based on the assumption that we assume there is -size clean classifier (where is the data dimension), ReLU net with only extra parameters is able to leverages robust memorization to achieve the CGRO, while robust classifier still requires exponential representation complexity in worst case. Next, we focus on a structured-data case to analyze training dynamics, where we train a two-layer convolutional network with width against adversarial perturbation. We then show that a three-stage phase transition occurs during learning process and the network provably converges to robust memorization regime, which thereby results in the CGRO. Besides, we also empirically verify our theoretical analysis by experiments in real-image recognition datasets.
Paper Structure (21 sections, 32 theorems, 82 equations, 4 figures, 2 tables)

This paper contains 21 sections, 32 theorems, 82 equations, 4 figures, 2 tables.

Key Result

Theorem 4.4

(Polynomial Upper Bound for CGRO) Under Assumption ass:bounded, ass:sep and ass:clean_exists, with $N$-sample training dataset $\mathcal{S} = \{(\boldsymbol{X}_1,y_1),(\boldsymbol{X}_2,y_2),\dots,(\boldsymbol{X}_N,y_N)\}$ drawn from the data distribution $\mathcal{D}$, there exists a CGRO classifier

Figures (4)

  • Figure 1: The learning curves of adversarial training on CIFAR10 with $\ell_{\infty}$-perturbation radius $\delta = 8/255$rice2020overfitting.
  • Figure 2: (a)(b): The effect of network capacity on the performance of the network. We trained the networks of varying capacity on MNIST (a) and CIFAR10 (b); (c): Feature learning process of the two-layer convolutional network on the structured data.
  • Figure 3: Experiment Results ($\ell_{\infty}$ Perturbation Radius $\epsilon_{0}=0.1$ on MNIST, $=8/255$ on CIFAR10).
  • Figure 4: Left: Local and Global Flatness During Adversarial Training on CIFAR10; Right: The Relation Between Robust Generalization Gap and Global Flatness on CIFAR10.

Theorems & Definitions (63)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Remark 3.5
  • Theorem 4.4
  • Lemma 4.5
  • Remark 4.6
  • Theorem 4.7
  • Remark 4.8
  • ...and 53 more