Adversarial vulnerability for any classifier
Alhussein Fawzi, Hamza Fawzi, Omar Fawzi
TL;DR
This work analyzes adversarial vulnerability under the assumption that data are generated by a smooth, high-dimensional latent-space model. It derives classifier-agnostic upper bounds on robustness, establishes transferability of perturbations across models, and relates in-distribution robustness to unconstrained robustness, including extensions to approximate generative models. The authors validate the bounds with SVHN and CIFAR-10 experiments, showing the bounds provide informative baselines and offering insights into desirable properties of generative models and latent-space classifiers. Overall, the results highlight fundamental limits on robustness under smooth generative assumptions and suggest design principles (e.g., latent-space linear boundaries) to approach those limits, while raising questions about perceptual metrics and modeling choices for human-robust vision.
Abstract
Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets.
