Transfer of Adversarial Robustness Between Perturbation Types
Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, Jacob Steinhardt
TL;DR
The paper investigates whether robustness learned against one perturbation type transfers to others in deep image classifiers. It conducts large-scale adversarial training and evaluation using 32 attacks across 5 perturbation types on a 100-class ImageNet subset, employing ResNet-50 to quantify cross-type transfer. Results show that transfer is partial and often asymmetric, with elastic perturbations transferring poorly and larger perturbations sometimes weakening robustness to other types; L2-based training can outperform L_inf in some regimes. The study advocates evaluating defenses over a diverse set of perturbation types and sizes to accurately gauge real-world robustness.
Abstract
We study the transfer of adversarial robustness of deep neural networks between different perturbation types. While most work on adversarial examples has focused on $L_\infty$ and $L_2$-bounded perturbations, these do not capture all types of perturbations available to an adversary. The present work evaluates 32 attacks of 5 different types against models adversarially trained on a 100-class subset of ImageNet. Our empirical results suggest that evaluating on a wide range of perturbation sizes is necessary to understand whether adversarial robustness transfers between perturbation types. We further demonstrate that robustness against one perturbation type may not always imply and may sometimes hurt robustness against other perturbation types. In light of these results, we recommend evaluation of adversarial defenses take place on a diverse range of perturbation types and sizes.
