On robust overfitting: adversarial training induced distribution matters
Runzhi Tian, Yongyi Mao
TL;DR
Robust overfitting in adversarial training is linked to the generalization difficulty of perturbation-induced distributions $\tilde{\mathcal D}_t$ encountered along the PGD-AT trajectory. The authors propose induced distribution experiments (IDE) and prove a generalization bound that ties the generalization gap to the expected local dispersion $\mathbb{E}_{\mathcal D^*}\tilde{\gamma}_t(x,y)$ of the perturbation operator $\mathcal Q_{x,y,\theta_t}$. Empirically, $\tilde{\gamma}_t(x,y)$ grows during training and tracks IDE errors across CIFAR-10/100, MNIST, and Reduced ImageNet, with angular-dispersion analyses offering a mechanism via the changing decision boundary. This distribution-dynamics perspective reframes robust generalization as a dynamical phenomenon and suggests new directions for mitigating robust overfitting by controlling perturbation dispersion.
Abstract
Adversarial training may be regarded as standard training with a modified loss function. But its generalization error appears much larger than standard training under standard loss. This phenomenon, known as robust overfitting, has attracted significant research attention and remains largely as a mystery. In this paper, we first show empirically that robust overfitting correlates with the increasing generalization difficulty of the perturbation-induced distributions along the trajectory of adversarial training (specifically PGD-based adversarial training). We then provide a novel upper bound for generalization error with respect to the perturbation-induced distributions, in which a notion of the perturbation operator, referred to "local dispersion", plays an important role. Experimental results are presented to validate the usefulness of the bound and various additional insights are provided.
