Regularization for Adversarial Robust Learning

Jie Wang; Rui Gao; Yao Xie

Regularization for Adversarial Robust Learning

Jie Wang, Rui Gao, Yao Xie

TL;DR

This work tackles the computational intractability of adversarial robustness under the $\infty$-Wasserstein DRO by introducing a phi-divergence regularized DRO objective that yields a smooth surrogate loss $\psi_{\eta}$. It derives a strong dual formulation and develops scalable stochastic gradient methods with biased oracles, notably SG and RT-MLMC, achieving near-optimal sample complexity for both convex and nonconvex losses. The authors reveal regularization effects that interpolate between gradient-norm, gradient-variance, and smoothed gradient-norm behaviors across scaling regimes, and provide generalization bounds for linear and neural-network models. Extensive experiments across supervised learning, reinforcement learning, and contextual learning demonstrate state-of-the-art robustness against adversarial perturbations, highlighting the framework's practical impact and broad applicability.

Abstract

Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates $φ$-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.

Regularization for Adversarial Robust Learning

TL;DR

This work tackles the computational intractability of adversarial robustness under the

-Wasserstein DRO by introducing a phi-divergence regularized DRO objective that yields a smooth surrogate loss

. It derives a strong dual formulation and develops scalable stochastic gradient methods with biased oracles, notably SG and RT-MLMC, achieving near-optimal sample complexity for both convex and nonconvex losses. The authors reveal regularization effects that interpolate between gradient-norm, gradient-variance, and smoothed gradient-norm behaviors across scaling regimes, and provide generalization bounds for linear and neural-network models. Extensive experiments across supervised learning, reinforcement learning, and contextual learning demonstrate state-of-the-art robustness against adversarial perturbations, highlighting the framework's practical impact and broad applicability.

Abstract

-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates

-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.

Paper Structure (24 sections, 16 theorems, 134 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 16 theorems, 134 equations, 6 figures, 3 tables, 2 algorithms.

Introduction
Phi-Divergence Regularized Adversarial Robust Training
Discussions
Visualization of Worst-case Distribution
Optimization Algorithm
Gradient Estimators
Solving penalized abc-divergence DRO with finite support
Complexity Analysis
Nonsmooth Convex Loss
Smooth Nonconvex Loss
Regularization Effects of Regularized Adversarial Robust Learning
Generalization Error Bound
Numerical Study
Supervised Learning
Reinforcement Learning
...and 9 more sections

Key Result

Theorem 1

Assume that $\mathcal{Z}$ is a measurable space, $f:~\mathcal{Z}\to\mathbb{R}\cup\{\infty\}$ is a measurable function, and for every joint distribution $\gamma\in\mathcal{P}(\mathcal{Z}\times\mathcal{Z})$ with $\mathrm{Proj}_{1\#}\gamma=\widehat{\mathbb{P}}$, it has a regular conditional distributio

Figures (6)

Figure 1: Landscape of the $1$-dimensional objective $f(\cdot)$
Figure 2: Worse-case distributions for different kinds of regularizations and different choices of parameters (including risk level $\alpha$ and regularization level $\eta$).
Figure 3: Results of adversarial training in terms of mis-classification rates. From top to bottom, the figures correspond to (a) MNIST; (b) Fashion-MNIST; (c) and Kuzushiji-MNIST datasets. From left to right, the figures correspond to (a) $\ell_2$-norm white noise attack; (b) $\ell_\infty$-norm white noise attack; (c) $\ell_2$-norm PGM attack; and (d) $\ell_{\infty}$-norm PGM attack.
Figure 3: Performance of $Q$-learning algorithms in original MDP and shifted MDP environments. Error bars are produced using $10$ independent trials.
Figure 4: Episode lengths during training. The environment caps episodes to $400$ steps.
...and 1 more figures

Theorems & Definitions (29)

Definition 1: $\phi$-divergence Regularization
Theorem 1: Strong Duality
Example 1: Indicator Regularization
Example 2: Entropic Regularization
Example 3: Quadratic Regularization
Example 4: Absolute Value Regularization
Example 5: Hinge Loss Regularization
Remark 1: Connections with Bayesian DRO
Proposition 1: Consistency of Regularized Formulation
Proposition 2: Performance Guarantees of Algorithm \ref{['alg:Eq:expression:R']}
...and 19 more

Regularization for Adversarial Robust Learning

TL;DR

Abstract

Regularization for Adversarial Robust Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (29)