Table of Contents
Fetching ...

Imperceptible Adversarial Examples in the Physical World

Weilin Xu, Sebastian Szyller, Cory Cornelius, Luis Murillo Rojas, Marius Arvinte, Alvaro Velasquez, Jason Martin, Nageen Himayat

TL;DR

This is the first work demonstrating imperceptible adversarial examples bounded by small $\ell_\infty$ norms in the physical world that force zero classification accuracy in the global perturbation threat model and cause near-zero in object detection in the patch perturbation threat model.

Abstract

Adversarial examples in the digital domain against deep learning-based computer vision models allow for perturbations that are imperceptible to human eyes. However, producing similar adversarial examples in the physical world has been difficult due to the non-differentiable image distortion functions in visual sensing systems. The existing algorithms for generating physically realizable adversarial examples often loosen their definition of adversarial examples by allowing unbounded perturbations, resulting in obvious or even strange visual patterns. In this work, we make adversarial examples imperceptible in the physical world using a straight-through estimator (STE, a.k.a. BPDA). We employ STE to overcome the non-differentiability -- applying exact, non-differentiable distortions in the forward pass of the backpropagation step, and using the identity function in the backward pass. Our differentiable rendering extension to STE also enables imperceptible adversarial patches in the physical world. Using printout photos, and experiments in the CARLA simulator, we show that STE enables fast generation of $\ell_\infty$ bounded adversarial examples despite the non-differentiable distortions. To the best of our knowledge, this is the first work demonstrating imperceptible adversarial examples bounded by small $\ell_\infty$ norms in the physical world that force zero classification accuracy in the global perturbation threat model and cause near-zero ($4.22\%$) AP50 in object detection in the patch perturbation threat model. We urge the community to re-evaluate the threat of adversarial examples in the physical world.

Imperceptible Adversarial Examples in the Physical World

TL;DR

This is the first work demonstrating imperceptible adversarial examples bounded by small norms in the physical world that force zero classification accuracy in the global perturbation threat model and cause near-zero in object detection in the patch perturbation threat model.

Abstract

Adversarial examples in the digital domain against deep learning-based computer vision models allow for perturbations that are imperceptible to human eyes. However, producing similar adversarial examples in the physical world has been difficult due to the non-differentiable image distortion functions in visual sensing systems. The existing algorithms for generating physically realizable adversarial examples often loosen their definition of adversarial examples by allowing unbounded perturbations, resulting in obvious or even strange visual patterns. In this work, we make adversarial examples imperceptible in the physical world using a straight-through estimator (STE, a.k.a. BPDA). We employ STE to overcome the non-differentiability -- applying exact, non-differentiable distortions in the forward pass of the backpropagation step, and using the identity function in the backward pass. Our differentiable rendering extension to STE also enables imperceptible adversarial patches in the physical world. Using printout photos, and experiments in the CARLA simulator, we show that STE enables fast generation of bounded adversarial examples despite the non-differentiable distortions. To the best of our knowledge, this is the first work demonstrating imperceptible adversarial examples bounded by small norms in the physical world that force zero classification accuracy in the global perturbation threat model and cause near-zero () AP50 in object detection in the patch perturbation threat model. We urge the community to re-evaluate the threat of adversarial examples in the physical world.

Paper Structure

This paper contains 15 sections, 6 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Straight-through estimator (STE, a.k.a. BPDA) combined with PGD reliably produces imperceptible adversarial examples in the physical world that fool the target model -- a pre-trained ResNet50 image classifier. We consider global adversarial perturbations bounded by $\ell_\infty=2|4|8|16$. See experimental details in \ref{['sec:global_perturbation_printouts']}.
  • Figure 2: We use STE (a.k.a. BPDA) to cross the non-differentiable barrier of the imaging pipeline under the global perturbation threat model. The key is to calculate the accurate loss function value with the non-differentiable distortion functions in the forward pass, but use the identity function in the backward pass to estimate the gradient.
  • Figure 3: We follow the experiments by Kurakin et al. kurakin2018adversarial to validate the power of STE in producing imperceptible adversarial examples in the physical world: 1. generate a digital printout of six square images; 2. print it out and take a photo of the paper; 3. perform perspective transformation and crop out the square images that are being fed to the target model.
  • Figure 4: The loss curves of untargeted PGD attacks bounded by four $\ell_\infty$ norms respectively. PGD in the digital domain (green dotted curve) is always effective in finding adversarial perturbations that increase the cross entropy loss with respect to the ground truth. However, the perturbations are not as effective if we print them out in the physical world (orange curve with triangle markers). STE helps the PGD optimization of adversarial perturbations in the non-differentiable physical environment (blue curve with circle markers). The STE-augmented PGD attack manages to cross the empirical threshold of loss (cyan dashed line) to get $0\%$ accuracy in the physical world with imperceptible adversarial perturbations bounded by $\ell_\infty$ norms as small as $=4/255$, as is shown in \ref{['fig:imperceptible_adversarial_examples_bird']} and \ref{['tab:results_printouts']}.
  • Figure 5: STE combined with differentiable rendering overcomes non-differentiability in the patch threat model.
  • ...and 14 more figures