Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm
S. M. Fazle Rabby Labib, Joyanta Jyoti Mondal, Meem Arafat Manab, Xi Xiao, Sarfaraz Newaz
TL;DR
The paper tackles the vulnerability of deep neural networks to adversarial inputs by extending the DeepFool framework to targeted misclassification with a configurable minimum confidence, yielding ET DeepFool. It introduces a computationally efficient modification that eliminates an inner class-loop, resulting in an $O(N)$ time complexity, while refining perturbation updates to focus on a specific target class via $\mathbf{w}'_k = \nabla f_t(\mathbf{x}_i) - \nabla f_{k(\mathbf{x}_0)}(\mathbf{x}_i)$ and $f'_k = f_t(\mathbf{x}_i) - f_{k(\mathbf{x}_0)}(\mathbf{x}_i)$, plus a softmax-based $c_{min}$ to enforce minimum confidence. The authors validate ET DeepFool across six datasets and eleven image classifiers, showing high target confidence (≈0.97) with small perturbations on many models, and revealing varying robustness (e.g., ViT and AlexNet requiring larger perturbations and longer times). They demonstrate that the method outperforms prior attacks in achieving targeted misclassification with high confidence while preserving image fidelity (high SSIM), and discuss the implications for model robustness and defense strategies. The work provides a publicly available implementation and points to future work on multi-target attacks, grey/black-box settings, and further optimization, highlighting practical impact for evaluating and strengthening image recognition systems. $\Delta(x; t) = \min_{r} ||r||_2 \; \text{subject to} \; \hat{k}(x+r) = t$, with $c = \text{softmax}(0, t)$ and $c_{min}$ governing adversarial confidence.$
Abstract
The susceptibility of deep neural networks (DNNs) to adversarial attacks undermines their reliability across numerous applications, underscoring the necessity for an in-depth exploration of these vulnerabilities and the formulation of robust defense strategies. The DeepFool algorithm by Moosavi-Dezfooli et al. (2016) represents a pivotal step in identifying minimal perturbations required to induce misclassification of input images. Nonetheless, its generic methodology falls short in scenarios necessitating targeted interventions. Additionally, previous research studies have predominantly concentrated on the success rate of attacks without adequately addressing the consequential distortion of images, the maintenance of image quality, or the confidence threshold required for misclassification. To bridge these gaps, we introduce the Enhanced Targeted DeepFool (ET DeepFool) algorithm, an evolution of DeepFool that not only facilitates the specification of desired misclassification targets but also incorporates a configurable minimum confidence score. Our empirical investigations demonstrate the superiority of this refined approach in maintaining the integrity of images and minimizing perturbations across a variety of DNN architectures. Unlike previous iterations, such as the Targeted DeepFool by Gajjar et al. (2022), our method grants unparalleled control over the perturbation process, enabling precise manipulation of model responses. Preliminary outcomes reveal that certain models, including AlexNet and the advanced Vision Transformer, display commendable robustness to such manipulations. This discovery of varying levels of model robustness, as unveiled through our confidence level adjustments, could have far-reaching implications for the field of image recognition. Our code is available at https://github.com/FazleLabib/et_deepfool.
