The Price of Implicit Bias in Adversarially Robust Generalization

Nikolaos Tsilivis; Natalie Frank; Nathan Srebro; Julia Kempe

The Price of Implicit Bias in Adversarially Robust Generalization

Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe

TL;DR

The paper addresses why robust ERM under adversarial perturbations exhibits large generalization gaps, focusing on the implicit bias of optimization. It develops a theory for linear models under steepest-descent dynamics, showing convergence to the minimum $\ell_r$-norm max-margin predictor that robustly classifies the data, and demonstrates that gradient descent adds a $\ell_{p^*}$-norm component that can hurt robust generalization, especially for $\ell_\infty$ perturbations. It extends the analysis to diagonal neural networks, where robust ERM induces an effective $\ell_1$ bias in predictor space, leading to different robustness properties; the work then validates these findings with extensive experiments on linear models and neural networks. The results underscore that the choice of optimization algorithm and network parameterization crucially determines robust performance, particularly as perturbation magnitude grows, and suggest exploring non-GD optimizers and reparameterizations to improve adversarial robustness.

Abstract

We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization in robust ERM can significantly affect the robustness of the model and identify two ways this can happen; either through the optimization algorithm or the architecture. We verify our predictions in simulations with synthetic data and experimentally study the importance of implicit bias in robust ERM with deep neural networks.

The Price of Implicit Bias in Adversarially Robust Generalization

TL;DR

-norm max-margin predictor that robustly classifies the data, and demonstrates that gradient descent adds a

-norm component that can hurt robust generalization, especially for

perturbations. It extends the analysis to diagonal neural networks, where robust ERM induces an effective

bias in predictor space, leading to different robustness properties; the work then validates these findings with extensive experiments on linear models and neural networks. The results underscore that the choice of optimization algorithm and network parameterization crucially determines robust performance, particularly as perturbation magnitude grows, and suggest exploring non-GD optimizers and reparameterizations to improve adversarial robustness.

Abstract

Paper Structure (44 sections, 14 theorems, 54 equations, 8 figures, 1 table)

This paper contains 44 sections, 14 theorems, 54 equations, 8 figures, 1 table.

Introduction
Our contributions
Notation
Related work
Capacity Control in Adversarially Robust Classification
Generalization Bounds for Adversarially Robust Classification
Optimal Regularization Depends on Sparsity of Data
Implicit Biases in Robust ERM
Price of Implicit Bias from the Optimization Algorithm
Steepest Descent
Price of Implicit Bias from Parameterization
Experiments
Linear models
Setup
Results
...and 29 more sections

Key Result

Theorem 3.1

MRT12AFM20 Fix $\rho > 0$. For any $\delta > 0$, with probability at least $1 - \delta$ over the draw of the dataset $S$, for all $h \in \mathcal{H}_r$ with $\mathcal{H}_r$ defined as in eq. eq:H, it holds:

Figures (8)

Figure 1: The price of implicit bias in adversarially robust generalization. Top: An illustration of the role of geometry in robust generalization: a separator that maximizes the $\ell_2$ distance between the training points (circles) might suffer a large error for test points (stars) perturbed within $\ell_\infty$ balls, while a separator that maximizes the $\ell_\infty$ distance might generalize better. Bottom: Binary classification of Gaussian data with (right) or without (left) $\ell_\infty$ perturbations of the input in $\mathbb{R}^d$ using linear models. We plot the (robust) generalization gap, i.e., (robust) train minus (robust) test accuracy, of different learning algorithms versus the training size $m$. In standard ERM ($\epsilon=0$), the algorithms generalize similarly. In robust ERM, however, the implicit bias of gradient descent is hurting the robust generalization of the models, while the implicit bias of coordinate descent/gradient descent with diagonal linear networks aids it. See Section \ref{['sec:experiments']} for details.
Figure 2: Left: Binary classification of data coming from a sparse teacher $\mathbf{w}^\star$ and dense $\mathbf{x}$, with (bottom) or without (top) $\ell_\infty$ perturbations of the input in $\mathbb{R}^d$ using linear models. We plot the (robust) generalization gap, i.e., (robust) train minus (robust) test accuracy, of different learning algorithms versus the training size $m$. For robust ERM, $\epsilon$ is set to be $\frac{1}{4}$ of the largest permissible value $\epsilon^\star$. The gap between the methods grows when we pass from ERM to robust ERM. Right: Average benefit of CD over GD (in terms of generalization gap) for different values of teacher sparsity $k_\mathcal{W}$, data sparsity $k_\mathcal{X}$ and magnitude of $\ell_\infty$ perturbation $\epsilon$.
Figure 3: Left: Comparison of two optimization algorithms, gradient descent and sign gradient descent, in ERM and robust ERM on a subset of MNIST (digits 2 vs 7) with 1 hidden layer ReLU nets. Train and test accuracy correspond to the magnitude of perturbation $\epsilon$ used during training. We observe that in robust ERM the gap between the generalization of the two algorithms increases. Right: Gap in (robust) test accuracy (with respect to the $\epsilon$ used in training) of CNNs trained with GD and SD (GD accuracy minus SD accuracy) on subsets of MNIST (all classes) for various of $\epsilon$ and $m$.
Figure 4: An illustration of the model selection problem we are facing in Section \ref{['sec:gen_bounds']}. We depict hypothesis classes which correspond to $\mathcal{H}_r = \{\mathbf{x} \mapsto \left\langle \mathbf{w},\mathbf{x} \right\rangle : \|\mathbf{w}\|_r \leq \mathcal{W}\}$ for $r=1, 2, \infty$ (notice that here, for illustration purposes, we keep $\mathcal{W}$ constant and not dependent on $r$). Increasing the order $r$ of $\mathcal{H}_r$ can decrease the approximation error of the class, but it might increase the complexity captured by the worst-case Rademacher Complexity term of eq. \ref{['eq:rad_dim_dependent']}. Furthermore, this complexity might increase significantly more than in "standard" classification (see 2nd term in the RHS of eq. \ref{['eq:rad_dim_dependent']}), and this is where the price of misselection comes from.
Figure 5: Binary classification of data coming from a dense teacher $\mathbf{w}^\star$ and sparse data $\mathbf{x}$ (top) and from a sparse $\mathbf{w}^\star$ and sparse data $\mathbf{x}$ (bottom). We compare performance of different algorithms with (right) or without (left) $\ell_\infty$ perturbations of the input in $\mathbb{R}^d$ using linear models. We plot the (robust) generalization gap, i.e., (robust) train minus (robust) test accuracy, of different learning algorithms versus the training size $m$. For robust ERM, $\epsilon$ is set to be $\frac{1}{4}$ of the largest permissible value $\epsilon^\star$. In accordance to the bounds of Section \ref{['ssec:cases']}, it can still be the case that $\ell_2$ solutions will generalize better in robust ERM, due to the significant advantage of them in ERM.
...and 3 more figures

Theorems & Definitions (27)

Theorem 3.1
Proposition 3.2
Definition 4.1
Definition 4.2
Theorem 4.3
Remark 4.4
Corollary 4.5
Theorem 4.6: Paraphrased Theorem 5 in LyZh22
Corollary 4.7
Proposition 4.8
...and 17 more

The Price of Implicit Bias in Adversarially Robust Generalization

TL;DR

Abstract

The Price of Implicit Bias in Adversarially Robust Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (27)