Table of Contents
Fetching ...

Spatially Transformed Adversarial Examples

Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, Dawn Song

TL;DR

This work introduces spatially transformed adversarial examples (stAdv) that deform input geometry via a learnable flow field, rather than perturbing pixel values directly. By combining a target-oriented adversarial objective with a local flow regularizer, stAdv achieves high perceptual realism and strong attack effectiveness across MNIST, CIFAR-10, and ImageNet while remaining challenging for existing defenses. Visualizations of displacement flows and attention maps reveal that perturbations concentrate at object boundaries and can misdirect network focus, even for robust models. The results underscore a new threat vector in adversarial machine learning and motivate development of defenses that address geometric perturbations as well as pixel-level perturbations.

Abstract

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.

Spatially Transformed Adversarial Examples

TL;DR

This work introduces spatially transformed adversarial examples (stAdv) that deform input geometry via a learnable flow field, rather than perturbing pixel values directly. By combining a target-oriented adversarial objective with a local flow regularizer, stAdv achieves high perceptual realism and strong attack effectiveness across MNIST, CIFAR-10, and ImageNet while remaining challenging for existing defenses. Visualizations of displacement flows and attention maps reveal that perturbations concentrate at object boundaries and can misdirect network focus, even for robust models. The results underscore a new threat vector in adversarial machine learning and motivate development of defenses that address geometric perturbations as well as pixel-level perturbations.

Abstract

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.

Paper Structure

This paper contains 26 sections, 7 equations, 33 figures, 6 tables.

Figures (33)

  • Figure 1: Generating adversarial examples with spatial transformation: the blue point denotes the coordinate of a pixel in the output adversarial image and the green point is its corresponding pixel in the input image. Red flow field represents the displacement from pixels in adversarial image to pixels in the input image.
  • Figure 2: Adversarial examples generated by stAdv against different models on MNIST. The ground truth images are shown in the diagonal and the rest are adversarial examples that are misclassified to the targeted class shown on the top.
  • Figure 3: Adversarial examples generated by stAdv against different models on CIFAR-10. The ground truth images are shown in the diagonal while the adversarial examples on each column are classified into the same class as the ground truth image within that column.
  • Figure 4: Comparison of adversarial examples generated by FGSM, C&W and stAdv. (Left: MNIST, right: CIFAR-10) The target class for MNIST is "0" and "air plane" for cifar. We generate adversarial examples by FGSM and C&W with perturbation bounded in terms of $L_\infty$ as 0.3 on MNIST and 8 on CIFAR-10.
  • Figure 5: Flow visualization on MNIST. The digit "0" is misclassified as "2".
  • ...and 28 more figures