Regularization for Adversarial Robust Learning
Jie Wang, Rui Gao, Yao Xie
TL;DR
This work tackles the computational intractability of adversarial robustness under the $\infty$-Wasserstein DRO by introducing a phi-divergence regularized DRO objective that yields a smooth surrogate loss $\psi_{\eta}$. It derives a strong dual formulation and develops scalable stochastic gradient methods with biased oracles, notably SG and RT-MLMC, achieving near-optimal sample complexity for both convex and nonconvex losses. The authors reveal regularization effects that interpolate between gradient-norm, gradient-variance, and smoothed gradient-norm behaviors across scaling regimes, and provide generalization bounds for linear and neural-network models. Extensive experiments across supervised learning, reinforcement learning, and contextual learning demonstrate state-of-the-art robustness against adversarial perturbations, highlighting the framework's practical impact and broad applicability.
Abstract
Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates $φ$-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.
