Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Zhongjie Shi; Fanghui Liu; Yuan Cao; Johan A. K. Suykens

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Zhongjie Shi, Fanghui Liu, Yuan Cao, Johan A. K. Suykens

TL;DR

The paper addresses whether overfitted deep networks trained adversarially can generalize, by adopting an approximation-theoretic lens. Using teacher-student constructions and localized approximation, it shows that there exist infinitely many over-parameterized DNNs with zero adversarial training error that nonetheless generalize robustly under well-separated, high-quality data and small perturbation radius δ, and that linear over-parameterization suffices when the target function is Hölder smooth. It also demonstrates analogous results for regression, providing near-optimal rates for standard generalization and matching robust-generalization bounds, while proving an inevitable robust generalization gap. Collectively, the results illuminate how data quality, function smoothness, and model capacity interact to control robust overfitting and robust generalization, offering theoretical foundations for robustness in DNNs from an approximation viewpoint.

Abstract

Adversarial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations. However, it is empirically observed that adversarial training on over-parameterized networks often suffers from the \textit{robust overfitting}: it can achieve almost zero adversarial training error while the robust generalization performance is not promising. In this paper, we provide a theoretical understanding of the question of whether overfitted DNNs in adversarial training can generalize from an approximation viewpoint. Specifically, our main results are summarized into three folds: i) For classification, we prove by construction the existence of infinitely many adversarial training classifiers on over-parameterized DNNs that obtain arbitrarily small adversarial training error (overfitting), whereas achieving good robust generalization error under certain conditions concerning the data quality, well separated, and perturbation level. ii) Linear over-parameterization (meaning that the number of parameters is only slightly larger than the sample size) is enough to ensure such existence if the target function is smooth enough. iii) For regression, our results demonstrate that there also exist infinitely many overfitted DNNs with linear over-parameterization in adversarial training that can achieve almost optimal rates of convergence for the standard generalization error. Overall, our analysis points out that robust overfitting can be avoided but the required model capacity will depend on the smoothness of the target function, while a robust generalization gap is inevitable. We hope our analysis will give a better understanding of the mathematical foundations of robustness in DNNs from an approximation view.

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

TL;DR

Abstract

Paper Structure (20 sections, 13 theorems, 97 equations, 1 figure)

This paper contains 20 sections, 13 theorems, 97 equations, 1 figure.

Introduction
Related Work
Problem settings and common assumptions
Main Results for Classification
Problem settings and notations
Assumptions
Generalization analysis of adversarial training global minima on over-parameterized FNNs
Main Results on Regression Tasks
Notations and assumptions
Standard generalization analysis of adversarial training estimators on over-parameterized FNNs
Robust generalization analysis of adversarial training on over-parameterized FNNs
Conclusion
Proof of Main Results in \ref{['sectionclass']}
Proof of \ref{['theorem3']}
Proof of \ref{['lowerbound2']}
...and 5 more sections

Key Result

Proposition 1

Let $f_c^\delta := \mathop{\mathrm{arg\,min}}\limits_f \mathcal{R}^\delta (f)$. For any classifier $f$, we have Moreover, we always have $\mathcal{R} (f) \leq \mathcal{R}^\delta (f)$.

Figures (1)

Figure 1: The learning curves of adversarial training on CIFAR-10 with $\delta=8/255$rice2020overfitting, while CIFAR-10 is 0.212-separated yang2020closer.

Theorems & Definitions (30)

Remark 1
Remark 2
Remark 3
Proposition 1
proof : Proof of \ref{['proposition2']}
Remark 4
Theorem 1: upper bound under the hinge loss
Remark 5
Remark 6
Theorem 2: lower bound under the hinge loss
...and 20 more

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

TL;DR

Abstract

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (30)