The Price of Implicit Bias in Adversarially Robust Generalization
Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe
TL;DR
The paper addresses why robust ERM under adversarial perturbations exhibits large generalization gaps, focusing on the implicit bias of optimization. It develops a theory for linear models under steepest-descent dynamics, showing convergence to the minimum $\ell_r$-norm max-margin predictor that robustly classifies the data, and demonstrates that gradient descent adds a $\ell_{p^*}$-norm component that can hurt robust generalization, especially for $\ell_\infty$ perturbations. It extends the analysis to diagonal neural networks, where robust ERM induces an effective $\ell_1$ bias in predictor space, leading to different robustness properties; the work then validates these findings with extensive experiments on linear models and neural networks. The results underscore that the choice of optimization algorithm and network parameterization crucially determines robust performance, particularly as perturbation magnitude grows, and suggest exploring non-GD optimizers and reparameterizations to improve adversarial robustness.
Abstract
We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization in robust ERM can significantly affect the robustness of the model and identify two ways this can happen; either through the optimization algorithm or the architecture. We verify our predictions in simulations with synthetic data and experimentally study the importance of implicit bias in robust ERM with deep neural networks.
