Table of Contents
Fetching ...

How Learning Dynamics Drive Adversarially Robust Generalization?

Yuelin Xu, Xiao Zhang

TL;DR

This work addresses the mechanisms driving adversarially robust generalization by developing a PAC-Bayesian framework that ties robust risk to the posterior covariance and the Hessian of the adversarial loss. By modeling SGD with momentum in a quadratic basin, the authors derive closed-form posterior covariances for stationary and early non-stationary training phases and plug these into a tractable generalization bound. The theory predicts how learning rate, gradient noise, and Hessian structure jointly shape robustness, and experiments on standard adversarial training and adversarial weight perturbation validate the link between posterior geometry and robust generalization, including the robust overfitting phenomenon. Overall, the paper provides a principled mechanism explaining robustness dynamics and explains why flatness-promoting methods like AWP improve performance, offering guidance for designing more robust training procedures.

Abstract

Despite significant progress in adversarially robust learning, the underlying mechanisms that govern robust generalization remain poorly understood. We propose a novel PAC-Bayesian framework that explicitly links adversarial robustness to the posterior covariance of model parameters and the curvature of the adversarial loss landscape. By characterizing discrete-time SGD dynamics near a local optimum under quadratic loss, we derive closed-form posterior covariances for both the stationary regime and the early phase of non-stationary transition. Our analyses reveal how key factors, such as learning rate, gradient noise, and Hessian structure, jointly shape robust generalization during training. Through empirical visualizations of these theoretical quantities, we fundamentally explain the phenomenon of robust overfitting and shed light on why flatness-promoting techniques like adversarial weight perturbation help to improve robustness.

How Learning Dynamics Drive Adversarially Robust Generalization?

TL;DR

This work addresses the mechanisms driving adversarially robust generalization by developing a PAC-Bayesian framework that ties robust risk to the posterior covariance and the Hessian of the adversarial loss. By modeling SGD with momentum in a quadratic basin, the authors derive closed-form posterior covariances for stationary and early non-stationary training phases and plug these into a tractable generalization bound. The theory predicts how learning rate, gradient noise, and Hessian structure jointly shape robustness, and experiments on standard adversarial training and adversarial weight perturbation validate the link between posterior geometry and robust generalization, including the robust overfitting phenomenon. Overall, the paper provides a principled mechanism explaining robustness dynamics and explains why flatness-promoting methods like AWP improve performance, offering guidance for designing more robust training procedures.

Abstract

Despite significant progress in adversarially robust learning, the underlying mechanisms that govern robust generalization remain poorly understood. We propose a novel PAC-Bayesian framework that explicitly links adversarial robustness to the posterior covariance of model parameters and the curvature of the adversarial loss landscape. By characterizing discrete-time SGD dynamics near a local optimum under quadratic loss, we derive closed-form posterior covariances for both the stationary regime and the early phase of non-stationary transition. Our analyses reveal how key factors, such as learning rate, gradient noise, and Hessian structure, jointly shape robust generalization during training. Through empirical visualizations of these theoretical quantities, we fundamentally explain the phenomenon of robust overfitting and shed light on why flatness-promoting techniques like adversarial weight perturbation help to improve robustness.

Paper Structure

This paper contains 26 sections, 11 theorems, 26 equations, 8 figures, 2 tables.

Key Result

Lemma 3.2

Let $\mathcal{D}$ be a probability distribution over $\mathcal{X}\times\mathcal{Y}$ and $\mathcal{S}$ be a set of examples i.i.d. sampled from $\mathcal{D}$. Suppose $\mathcal{P}$ is a data-independent prior distribution defined over the model parameter space $\mathcal{W}$. For any $\beta > 0$, any where $\mathrm{KL}(\mathcal{Q} \: || \: \mathcal{P})$ denotes the Kullback–Leibler (KL) divergence

Figures (8)

  • Figure 1: Curves of Hessian and posterior parameters derived from our generalization bounds under standard AT on CIFAR-10. Vertical dashed lines mark learning rate decays at epochs $100$ and $150$.
  • Figure 2: Learning curves of Hessian and posterior parameters under AWP on CIFAR-10.
  • Figure 3: Comparison of commutativity and alignment properties under AT and AWP.
  • Figure 4: Additional results for standard training on CIFAR-10.
  • Figure 5: Additional results for adversarial training on CIFAR-100.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Definition 3.1: Adversarial risk
  • Lemma 3.2: PAC-Bayesian Robust Generalization Bound
  • Lemma 3.4
  • Lemma 3.6
  • Theorem 3.7: Robust Generalization with Gaussians & Quadratic Loss
  • Lemma 4.1: State-Space Representation & Covariance Propagation
  • Lemma 4.2: Stationary Mean
  • Lemma 4.3: Stationary Covariance
  • Remark 4.4
  • Theorem 4.5: Robust Generalization under Stationary Regime
  • ...and 6 more