Table of Contents
Fetching ...

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Anish Athalye, Nicholas Carlini

TL;DR

The paper investigates the robustness of two white-box defenses proposed at CVPR 2018: Pixel Deflection and High-Level Representation Guided Denoiser. Using adaptive white-box attacks (PGD) and BPDA to handle non-differentiable components, the authors show that both defenses can be defeated under a small perturbation budget on ImageNet, achieving $0 ext{%}$ accuracy and high targeted attack success. The findings reveal that non-differentiable or denoising-based defenses do not guarantee robustness against informed attackers, and emphasize evaluating defenses under realistic, white-box threat models. Overall, the work provides a critical, methods-level critique and contributes to the ongoing discourse on designing robust defenses.

Abstract

Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

TL;DR

The paper investigates the robustness of two white-box defenses proposed at CVPR 2018: Pixel Deflection and High-Level Representation Guided Denoiser. Using adaptive white-box attacks (PGD) and BPDA to handle non-differentiable components, the authors show that both defenses can be defeated under a small perturbation budget on ImageNet, achieving accuracy and high targeted attack success. The findings reveal that non-differentiable or denoising-based defenses do not guarantee robustness against informed attackers, and emphasize evaluating defenses under realistic, white-box threat models. Overall, the work provides a critical, methods-level critique and contributes to the ongoing discourse on designing robust defenses.

Abstract

Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.

Paper Structure

This paper contains 7 sections, 1 figure.

Figures (1)

  • Figure 1: Original images from ImageNet validation set (row 1). Targeted adversarial examples (with randomly chosen targets) for Pixel Deflection (row 2) and High-level representation Guided Denoiser (row 3), with a $\ell_\infty$ perturbation of $\epsilon = 4/255$.