Table of Contents
Fetching ...

Regularization for Adversarial Robust Learning

Jie Wang, Rui Gao, Yao Xie

TL;DR

This work tackles the computational intractability of adversarial robustness under the $\infty$-Wasserstein DRO by introducing a phi-divergence regularized DRO objective that yields a smooth surrogate loss $\psi_{\eta}$. It derives a strong dual formulation and develops scalable stochastic gradient methods with biased oracles, notably SG and RT-MLMC, achieving near-optimal sample complexity for both convex and nonconvex losses. The authors reveal regularization effects that interpolate between gradient-norm, gradient-variance, and smoothed gradient-norm behaviors across scaling regimes, and provide generalization bounds for linear and neural-network models. Extensive experiments across supervised learning, reinforcement learning, and contextual learning demonstrate state-of-the-art robustness against adversarial perturbations, highlighting the framework's practical impact and broad applicability.

Abstract

Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates $φ$-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.

Regularization for Adversarial Robust Learning

TL;DR

This work tackles the computational intractability of adversarial robustness under the -Wasserstein DRO by introducing a phi-divergence regularized DRO objective that yields a smooth surrogate loss . It derives a strong dual formulation and develops scalable stochastic gradient methods with biased oracles, notably SG and RT-MLMC, achieving near-optimal sample complexity for both convex and nonconvex losses. The authors reveal regularization effects that interpolate between gradient-norm, gradient-variance, and smoothed gradient-norm behaviors across scaling regimes, and provide generalization bounds for linear and neural-network models. Extensive experiments across supervised learning, reinforcement learning, and contextual learning demonstrate state-of-the-art robustness against adversarial perturbations, highlighting the framework's practical impact and broad applicability.

Abstract

Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the -Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates -divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.
Paper Structure (24 sections, 16 theorems, 134 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 16 theorems, 134 equations, 6 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Assume that $\mathcal{Z}$ is a measurable space, $f:~\mathcal{Z}\to\mathbb{R}\cup\{\infty\}$ is a measurable function, and for every joint distribution $\gamma\in\mathcal{P}(\mathcal{Z}\times\mathcal{Z})$ with $\mathrm{Proj}_{1\#}\gamma=\widehat{\mathbb{P}}$, it has a regular conditional distributio

Figures (6)

  • Figure 1: Landscape of the $1$-dimensional objective $f(\cdot)$
  • Figure 2: Worse-case distributions for different kinds of regularizations and different choices of parameters (including risk level $\alpha$ and regularization level $\eta$).
  • Figure 3: Results of adversarial training in terms of mis-classification rates. From top to bottom, the figures correspond to (a) MNIST; (b) Fashion-MNIST; (c) and Kuzushiji-MNIST datasets. From left to right, the figures correspond to (a) $\ell_2$-norm white noise attack; (b) $\ell_\infty$-norm white noise attack; (c) $\ell_2$-norm PGM attack; and (d) $\ell_{\infty}$-norm PGM attack.
  • Figure 3: Performance of $Q$-learning algorithms in original MDP and shifted MDP environments. Error bars are produced using $10$ independent trials.
  • Figure 4: Episode lengths during training. The environment caps episodes to $400$ steps.
  • ...and 1 more figures

Theorems & Definitions (29)

  • Definition 1: $\phi$-divergence Regularization
  • Theorem 1: Strong Duality
  • Example 1: Indicator Regularization
  • Example 2: Entropic Regularization
  • Example 3: Quadratic Regularization
  • Example 4: Absolute Value Regularization
  • Example 5: Hinge Loss Regularization
  • Remark 1: Connections with Bayesian DRO
  • Proposition 1: Consistency of Regularized Formulation
  • Proposition 2: Performance Guarantees of Algorithm \ref{['alg:Eq:expression:R']}
  • ...and 19 more