How Learning Dynamics Drive Adversarially Robust Generalization?
Yuelin Xu, Xiao Zhang
TL;DR
This work addresses the mechanisms driving adversarially robust generalization by developing a PAC-Bayesian framework that ties robust risk to the posterior covariance and the Hessian of the adversarial loss. By modeling SGD with momentum in a quadratic basin, the authors derive closed-form posterior covariances for stationary and early non-stationary training phases and plug these into a tractable generalization bound. The theory predicts how learning rate, gradient noise, and Hessian structure jointly shape robustness, and experiments on standard adversarial training and adversarial weight perturbation validate the link between posterior geometry and robust generalization, including the robust overfitting phenomenon. Overall, the paper provides a principled mechanism explaining robustness dynamics and explains why flatness-promoting methods like AWP improve performance, offering guidance for designing more robust training procedures.
Abstract
Despite significant progress in adversarially robust learning, the underlying mechanisms that govern robust generalization remain poorly understood. We propose a novel PAC-Bayesian framework that explicitly links adversarial robustness to the posterior covariance of model parameters and the curvature of the adversarial loss landscape. By characterizing discrete-time SGD dynamics near a local optimum under quadratic loss, we derive closed-form posterior covariances for both the stationary regime and the early phase of non-stationary transition. Our analyses reveal how key factors, such as learning rate, gradient noise, and Hessian structure, jointly shape robust generalization during training. Through empirical visualizations of these theoretical quantities, we fundamentally explain the phenomenon of robust overfitting and shed light on why flatness-promoting techniques like adversarial weight perturbation help to improve robustness.
