On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses
Anish Athalye, Nicholas Carlini
TL;DR
The paper investigates the robustness of two white-box defenses proposed at CVPR 2018: Pixel Deflection and High-Level Representation Guided Denoiser. Using adaptive white-box attacks (PGD) and BPDA to handle non-differentiable components, the authors show that both defenses can be defeated under a small perturbation budget on ImageNet, achieving $0 ext{%}$ accuracy and high targeted attack success. The findings reveal that non-differentiable or denoising-based defenses do not guarantee robustness against informed attackers, and emphasize evaluating defenses under realistic, white-box threat models. Overall, the work provides a critical, methods-level critique and contributes to the ongoing discourse on designing robust defenses.
Abstract
Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.
