Revisiting DeepFool: generalization and improvement
Alireza Abdollahpoorrostam, Mahed Abroshan, Seyed-Mohsen Moosavi-Dezfooli
TL;DR
The paper tackles the problem of robustly evaluating and improving neural networks against minimal $\ell_2$ adversarial perturbations in white-box settings. It introduces SuperDeepFool (SDF), a geometry-guided attack framework that couples DeepFool steps with a boundary-normal projection, producing smaller perturbations with only a modest computational overhead. The authors formalize the SDF family $\text{SDF}(m,n)$ and highlight the particularly effective $\text{SDF}(\infty,1)$ variant, while providing theoretical and empirical evidence that SDF better aligns perturbations with the decision boundary than DeepFool. They demonstrate strong performance improvements over a range of minimum-norm attacks, show that adversarial training with SDF enhances $\ell_2$ robustness and reduces network curvature, and integrate SDF into AutoAttack++ to speed robustness evaluation on large models. Altogether, SDF offers a scalable, parameter-free approach for both evaluating and boosting the robustness of deep networks against minimal $\ell_2$ perturbations.
Abstract
Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal $\ell_2$ adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal $\ell_2$ adversarial perturbations.
