Evaluating the Robustness of the "Ensemble Everything Everywhere" Defense
Jie Zhang, Christian Schlarmann, Kristina Nikolić, Nicholas Carlini, Francesco Croce, Matthias Hein, Florian Tramèr
TL;DR
The paper critically re-evaluates the Ensemble Everything Everywhere defense, showing that its claimed robustness is undermined by gradient masking and randomness. By constructing stronger adaptive attacks, including Transfer from a CrossMax-removed model and Expectation over Transformation, the authors reduce robust accuracy from approximately $0.62$ on CIFAR-10 and $0.48$ on CIFAR-100 to about $0.11$–$0.14$ under the $\ell_\infty$ threat model with $\varepsilon=8/255$. They demonstrate that AutoAttack alone is insufficient to establish robustness, and that perceptibly aligned gradients do not guarantee resilience to worst-case perturbations. The work underscores the need for rigorous, adaptive evaluations of defenses and discusses broader implications and potential legitimate uses of multi-resolution ideas beyond adversarial robustness.
Abstract
Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations at multiple noisy image resolutions, producing a single robust classification. This defense was shown to be effective against multiple state-of-the-art attacks. Perhaps even more convincingly, it was shown that the model's gradients are perceptually aligned: attacks against the model produce noise that perceptually resembles the targeted class. In this short note, we show that this defense is not robust to adversarial attack. We first show that the defense's randomness and ensembling method cause severe gradient masking. We then use standard adaptive attack techniques to reduce the defense's robust accuracy from 48% to 14% on CIFAR-100 and from 62% to 11% on CIFAR-10, under the $\ell_\infty$-norm threat model with $\varepsilon=8/255$.
