Table of Contents
Fetching ...

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe

TL;DR

This work investigates the question of how to choose the regularization norm $\lVert \cdot \rVert$ in the context of high-dimensional adversarial training for binary classification, and quantitatively characterize the relationship between perturbation size and the optimal choice of $\lVert \cdot \rVert$.

Abstract

Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to choose the regularization norm $\lVert \cdot \rVert$ in the context of high-dimensional adversarial training for binary classification. To this end, we first derive an exact asymptotic description of the robust, regularized empirical risk minimizer for various types of adversarial attacks and regularization norms (including non-$\ell_p$ norms). We complement this analysis with a uniform convergence analysis, deriving bounds on the Rademacher Complexity for this class of problems. Leveraging our theoretical results, we quantitatively characterize the relationship between perturbation size and the optimal choice of $\lVert \cdot \rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

TL;DR

This work investigates the question of how to choose the regularization norm in the context of high-dimensional adversarial training for binary classification, and quantitatively characterize the relationship between perturbation size and the optimal choice of .

Abstract

Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to choose the regularization norm in the context of high-dimensional adversarial training for binary classification. To this end, we first derive an exact asymptotic description of the robust, regularized empirical risk minimizer for various types of adversarial attacks and regularization norms (including non- norms). We complement this analysis with a uniform convergence analysis, deriving bounds on the Rademacher Complexity for this class of problems. Leveraging our theoretical results, we quantitatively characterize the relationship between perturbation size and the optimal choice of , confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.

Paper Structure

This paper contains 43 sections, 12 theorems, 104 equations, 6 figures.

Key Result

Theorem 3.1

Let $\hat{\boldsymbol{w}}(\mathcal{S})\in\mathbb{R}^{d}$ denote a solution of the RERM problem in eq:adversarial-training-problem. Then, under ass:scaling-epsass:high-dimensional-limitass:lp-normsass:data-distribution-separable the standard, robust and boundary generalization error of $\hat{\boldsym where $m^\star, q^\star, P^\star$ are the limiting values of the following summary statistics:

Figures (6)

  • Figure 1: (Left) Generalization error of RERMs in the low sample complexity regime under $\ell_\infty$ perturbations for various choices of regularization. We see that the edge of $\ell_1$ over the rest of the methods stems from the boundary error which goes to zero as $\alpha \to 0^+$. Setting: $\varepsilon = 0.2$ and optimally tuned regularization parameter $\lambda$. The bullet points with the error bars are RERM simulations for $d=1000$ ($10$ random seeds). (Right) Difference between robust generalization errors for $r=2$ and $r=1$ as a function of $\varepsilon$ and $\alpha$ for $\ell_\infty$ attacks. Green zones correspond to areas where the the dual norm regularization is better than $\ell_2$.
  • Figure 2: (Left) Difference between robust generalization error for $\boldsymbol{\Sigma}_{\boldsymbol{\delta}}$ perturbations. We see that a regularization with the dual norm has the best adversarial error for different choices of $\varepsilon$. The points with the error bars (std) are RERM simulations for $d=1000$ (10 random seeds). (Right) Robust generalization error of the solution of regularized RERM as a function of the regularization order $r$, i.e. $r(\boldsymbol{w}) = \lambda \norm{\boldsymbol{w}}^r_r$ for various perturbations strengths $\varepsilon$. Sample complexity $\alpha=1.0$. Regularization coefficients $\lambda$ are optimally tuned. The inside figure shows how the optimal value of $r$ scales with $\varepsilon$.
  • Figure 3: Scaling of the overlap parameters in the low sample complexity regime for $p = \infty$, $\varepsilon = 0.3$, $\rho = 1$ and $\lambda = 10^{-3}$. The numbers presented in the legends are the linear fit in log-log scale of the dashed part.
  • Figure 4: Robust error as a function of the regularization order $r$ for two different ${p^\star}$. By increasing the value of $\varepsilon$ we have that the optimal value $r^\star$ gets close to ${p^\star}$.
  • Figure 5: Robust error, generalization error and boundary error for different choices of regularization geometry $r$ as a function of the sample complexity $\alpha$. We see that the value of the errors increases with $\varepsilon$.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Theorem 3.1: Limiting errors for $\ell_p$ norm
  • Remark 1
  • Theorem 3.2: Self-consistent equations for $\ell_p$ norms
  • Remark 2
  • Remark 3
  • Theorem 3.3: Limiting errors for Mahalanobis norm
  • Theorem 3.4: Self-Consistent equations for Mahalanobis norm
  • Remark 4
  • Theorem 4.1: MRT12AFM20
  • Proposition 1
  • ...and 12 more