Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?

Nicholas Carlini

Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?

Nicholas Carlini

TL;DR

The paper evaluates AmI, an interpretability-based defense for detecting adversarial examples, by applying a defense-oblivious attack that crafts high-confidence adversaries on the original model and tests them on the defended model. It reports that AmI achieves 0% true-positive rate under untargeted attacks with a small distortion bound, indicating no robustness gain over the undefended network. The authors critique the lack of a formal threat model in the original defense publication and advocate for transparent evaluation and release of code to prevent overstated claims. Overall, the work emphasizes the need for rigorous, threat-model-driven assessment of defenses, especially those leveraging interpretability signals, to avoid misleading conclusions about robustness.

Abstract

No.

Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?

TL;DR

Abstract

Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)