Table of Contents
Fetching ...

Adversarial vulnerability for any classifier

Alhussein Fawzi, Hamza Fawzi, Omar Fawzi

TL;DR

This work analyzes adversarial vulnerability under the assumption that data are generated by a smooth, high-dimensional latent-space model. It derives classifier-agnostic upper bounds on robustness, establishes transferability of perturbations across models, and relates in-distribution robustness to unconstrained robustness, including extensions to approximate generative models. The authors validate the bounds with SVHN and CIFAR-10 experiments, showing the bounds provide informative baselines and offering insights into desirable properties of generative models and latent-space classifiers. Overall, the results highlight fundamental limits on robustness under smooth generative assumptions and suggest design principles (e.g., latent-space linear boundaries) to approach those limits, while raising questions about perceptual metrics and modeling choices for human-robust vision.

Abstract

Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets.

Adversarial vulnerability for any classifier

TL;DR

This work analyzes adversarial vulnerability under the assumption that data are generated by a smooth, high-dimensional latent-space model. It derives classifier-agnostic upper bounds on robustness, establishes transferability of perturbations across models, and relates in-distribution robustness to unconstrained robustness, including extensions to approximate generative models. The authors validate the bounds with SVHN and CIFAR-10 experiments, showing the bounds provide informative baselines and offering insights into desirable properties of generative models and latent-space classifiers. Overall, the results highlight fundamental limits on robustness under smooth generative assumptions and suggest design principles (e.g., latent-space linear boundaries) to approach those limits, while raising questions about perceptual metrics and modeling choices for human-robust vision.

Abstract

Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets.

Paper Structure

This paper contains 20 sections, 7 theorems, 39 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $f: \mathbb{R}^m \rightarrow \{1, \dots, K\}$ be an arbitrary classification function defined on the image space. Then, the fraction of datapoints having robustness less than $\eta$ satisfies: where $\Phi$ is the cdf of $\mathcal{N}(0,1)$, and $a_{\neq i} = \Phi^{-1} \left(\mathbb{P} \left(\bigcup\limits_{j \neq i} C_j \right)\right)$. In particular, if for all $i$, $\mathbb{P}(C_i) \leq \fra

Figures (5)

  • Figure 1: Setting used in this paper. The data distribution is obtained by mapping $\mathcal{N} (0, I_d)$ through $g$ (we set $d = 1$ and $g(z) = (\cos(2 \pi z), \sin(2 \pi z))$ in this example). The thick circle indicates the support of the data distribution $\mu$ in $\mathbb{R}^m$ ($m = 2$ here). The binary discriminative function $f$ separates the data space into two classification regions (red and blue colors). While the in-distribution perturbed image is required to belong to the data support, this is not necessarily the case in the unconstrained setting. In this paper, we do not put any assumption on $f$, resulting in potentially arbitrary partitioning of the data space. While the existence of very small adversarial perturbations seems counter-intuitive in this low-dimensional illustrative example (i.e., $r_{\text{in}}$ and $r_{\text{unc}}$ can be large for some choices of $f$), we show in the next sections that this is the case in high dimensions.
  • Figure 2: Upper bound (Theorem \ref{['thm:image_space_bounds']}) on the median of the normalized robustness $r_{\text{in}} / \sqrt{d}$ for different values of the number of classes $K$, in the setting where $\omega(t) = t$. We assume that classes have equal measure (i.e., $\mathbb{P}(C_i) = 1/K$).
  • Figure 3: Left: Illustration of checkerboard example. Right: Lower bound on robustness as a function of $\eta$ for the general result in Theorem \ref{['thm:image_space_bounds']} (blue curve) and the checkerboard example in Eq. \ref{['eq:isoperimetry_checkerboard']} (red curve).
  • Figure 4: Examples of generated images with DCGAN for the SVHN dataset, and associated perturbed images (in-distribution perturbations). For each pair of images, the left shows the original image, and the right shows the perturbed image. The estimated label (using ResNet-18) of each image is shown on top of each image.
  • Figure 5: Examples of generated images with DCGAN, and associated perturbed image (in-distribution perturbation). For each pair of images, the left shows the original image, and the right shows the perturbed image. The estimated label (using the VGG-type convnet) of each image (original and perturbed) is shown on top of each image.

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Theorem 3: Transferability of perturbations
  • Theorem 4
  • Theorem 5: Gaussian isoperimetric inequality
  • Lemma 1: see e.g., duembgen2010bounding
  • Lemma 2
  • proof
  • proof
  • proof
  • ...and 4 more