Revisiting DeepFool: generalization and improvement

Alireza Abdollahpoorrostam; Mahed Abroshan; Seyed-Mohsen Moosavi-Dezfooli

Revisiting DeepFool: generalization and improvement

Alireza Abdollahpoorrostam, Mahed Abroshan, Seyed-Mohsen Moosavi-Dezfooli

TL;DR

The paper tackles the problem of robustly evaluating and improving neural networks against minimal $\ell_2$ adversarial perturbations in white-box settings. It introduces SuperDeepFool (SDF), a geometry-guided attack framework that couples DeepFool steps with a boundary-normal projection, producing smaller perturbations with only a modest computational overhead. The authors formalize the SDF family $\text{SDF}(m,n)$ and highlight the particularly effective $\text{SDF}(\infty,1)$ variant, while providing theoretical and empirical evidence that SDF better aligns perturbations with the decision boundary than DeepFool. They demonstrate strong performance improvements over a range of minimum-norm attacks, show that adversarial training with SDF enhances $\ell_2$ robustness and reduces network curvature, and integrate SDF into AutoAttack++ to speed robustness evaluation on large models. Altogether, SDF offers a scalable, parameter-free approach for both evaluating and boosting the robustness of deep networks against minimal $\ell_2$ perturbations.

Abstract

Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal $\ell_2$ adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal $\ell_2$ adversarial perturbations.

Revisiting DeepFool: generalization and improvement

TL;DR

The paper tackles the problem of robustly evaluating and improving neural networks against minimal

adversarial perturbations in white-box settings. It introduces SuperDeepFool (SDF), a geometry-guided attack framework that couples DeepFool steps with a boundary-normal projection, producing smaller perturbations with only a modest computational overhead. The authors formalize the SDF family

and highlight the particularly effective

variant, while providing theoretical and empirical evidence that SDF better aligns perturbations with the decision boundary than DeepFool. They demonstrate strong performance improvements over a range of minimum-norm attacks, show that adversarial training with SDF enhances

robustness and reduces network curvature, and integrate SDF into AutoAttack++ to speed robustness evaluation on large models. Altogether, SDF offers a scalable, parameter-free approach for both evaluating and boosting the robustness of deep networks against minimal

perturbations.

Abstract

adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal

adversarial perturbations.

Paper Structure (37 sections, 3 theorems, 17 equations, 9 figures, 21 tables, 4 algorithms)

This paper contains 37 sections, 3 theorems, 17 equations, 9 figures, 21 tables, 4 algorithms.

Introduction
Why does $\ell_2$ white-box adversarial robustness matter?
DeepFool (DF) and Minimal Adversarial Perturbations
SuperDeepFool: Efficient Algorithms to Find Minimal Perturbations
A Family of Adversarial Attacks
SDF Attack
Experimental Results
Comparison with DeepFool (DF)
Verifying optimality conditions for SDF.
Comparison with minimum-norm attacks
SDF Adversarial Training (AT)
Conclusion and Future Works
Acknowledgments
Appendix
Proofs
...and 22 more sections

Key Result

Proposition 1

Let the binary classifier $\mathcal{F}$For the sake of clarity, we use $\mathcal{F}$ to denote binary classifiers for this proposition.$:\mathbb{R}^{d} \rightarrow \mathbb{R}$ be continuously differentiable and its gradient $\nabla \mathcal{F}$ is $\beta$-Lipschitz. For a given input sample $\boldsy

Figures (9)

Figure 1: The average number of gradient computations vs the mean $\ell_2$-norm of perturbations. It shows that our novel fast and accurate method, SDF, outperforms other minimum-norm attacks. SDF finds significantly smaller perturbations compared to DF, with only a small increase in computational cost. SDF also outperforms other algorithms in optimality and speed. The numbers are taken from Table \ref{['tab:ImageNet']}.
Figure 2: Illustration of the optimal adversarial example $\boldsymbol{x}+\boldsymbol{r}^*$ for a binary classifier $f$; the example lies on the decision boundary (set of points where $f(\boldsymbol{x})=0$) and the perturbation vector $\boldsymbol{r}^*$ is orthogonal to this boundary.
Figure 3: (Left) we generated 1000 images with one hundred $\gamma$ between zero and one, and the fooling rate of the DeepFool and SuperDeepFool is reported. This experiment is done on the CIFAR10 dataset and ResNet18 model. (Right) histogram of the cosine angle between the normal to the decision boundary and the perturbation vector obtained by DeepFool and SuperDeepFool has been showed.
Figure 4: Histogram of the cosine angle between the normal to the decision boundary and the perturbation vector obtained by C$\&$W and FMN.
Figure 5: SDF for multi-class classifiers
...and 4 more figures

Theorems & Definitions (3)

Proposition 1
Proposition 2
Proposition 3

Revisiting DeepFool: generalization and improvement

TL;DR

Abstract

Revisiting DeepFool: generalization and improvement

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (3)