Exploring the Space of Adversarial Images
Pedro Tabacof, Eduardo Valle
TL;DR
This paper formalizes adversarial image generation for pretrained classifiers as a constrained optimization problem, revealing nonconvexity even in linear models. It analyzes the pixel-space geometry by perturbing images with random noise and evaluating label stability across MNIST (logistic and ConvNet) and ImageNet (OverFeat), showing that adversarial regions are large and not isolated. The study finds that shallow classifiers can be more robust than deep networks on the same task and highlights the influence of perturbation distribution, especially heavier-tailed noise, on robustness. The results challenge simple linearity explanations for adversarial vulnerability and point to complex spatial structures that warrant further defense-oriented research.
Abstract
Adversarial examples have raised questions regarding the robustness and security of deep neural networks. In this work we formalize the problem of adversarial images given a pretrained classifier, showing that even in the linear case the resulting optimization problem is nonconvex. We generate adversarial images using shallow and deep classifiers on the MNIST and ImageNet datasets. We probe the pixel space of adversarial images using noise of varying intensity and distribution. We bring novel visualizations that showcase the phenomenon and its high variability. We show that adversarial images appear in large regions in the pixel space, but that, for the same task, a shallow classifier seems more robust to adversarial images than a deep convolutional network.
