Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm

S. M. Fazle Rabby Labib; Joyanta Jyoti Mondal; Meem Arafat Manab; Xi Xiao; Sarfaraz Newaz

Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm

S. M. Fazle Rabby Labib, Joyanta Jyoti Mondal, Meem Arafat Manab, Xi Xiao, Sarfaraz Newaz

TL;DR

The paper tackles the vulnerability of deep neural networks to adversarial inputs by extending the DeepFool framework to targeted misclassification with a configurable minimum confidence, yielding ET DeepFool. It introduces a computationally efficient modification that eliminates an inner class-loop, resulting in an $O(N)$ time complexity, while refining perturbation updates to focus on a specific target class via $\mathbf{w}'_k = \nabla f_t(\mathbf{x}_i) - \nabla f_{k(\mathbf{x}_0)}(\mathbf{x}_i)$ and $f'_k = f_t(\mathbf{x}_i) - f_{k(\mathbf{x}_0)}(\mathbf{x}_i)$, plus a softmax-based $c_{min}$ to enforce minimum confidence. The authors validate ET DeepFool across six datasets and eleven image classifiers, showing high target confidence (≈0.97) with small perturbations on many models, and revealing varying robustness (e.g., ViT and AlexNet requiring larger perturbations and longer times). They demonstrate that the method outperforms prior attacks in achieving targeted misclassification with high confidence while preserving image fidelity (high SSIM), and discuss the implications for model robustness and defense strategies. The work provides a publicly available implementation and points to future work on multi-target attacks, grey/black-box settings, and further optimization, highlighting practical impact for evaluating and strengthening image recognition systems. $\Delta(x; t) = \min_{r} ||r||_2 \; \text{subject to} \; \hat{k}(x+r) = t$, with $c = \text{softmax}(0, t)$ and $c_{min}$ governing adversarial confidence.$

Abstract

The susceptibility of deep neural networks (DNNs) to adversarial attacks undermines their reliability across numerous applications, underscoring the necessity for an in-depth exploration of these vulnerabilities and the formulation of robust defense strategies. The DeepFool algorithm by Moosavi-Dezfooli et al. (2016) represents a pivotal step in identifying minimal perturbations required to induce misclassification of input images. Nonetheless, its generic methodology falls short in scenarios necessitating targeted interventions. Additionally, previous research studies have predominantly concentrated on the success rate of attacks without adequately addressing the consequential distortion of images, the maintenance of image quality, or the confidence threshold required for misclassification. To bridge these gaps, we introduce the Enhanced Targeted DeepFool (ET DeepFool) algorithm, an evolution of DeepFool that not only facilitates the specification of desired misclassification targets but also incorporates a configurable minimum confidence score. Our empirical investigations demonstrate the superiority of this refined approach in maintaining the integrity of images and minimizing perturbations across a variety of DNN architectures. Unlike previous iterations, such as the Targeted DeepFool by Gajjar et al. (2022), our method grants unparalleled control over the perturbation process, enabling precise manipulation of model responses. Preliminary outcomes reveal that certain models, including AlexNet and the advanced Vision Transformer, display commendable robustness to such manipulations. This discovery of varying levels of model robustness, as unveiled through our confidence level adjustments, could have far-reaching implications for the field of image recognition. Our code is available at https://github.com/FazleLabib/et_deepfool.

Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm

TL;DR

time complexity, while refining perturbation updates to focus on a specific target class via

and

, plus a softmax-based

to enforce minimum confidence. The authors validate ET DeepFool across six datasets and eleven image classifiers, showing high target confidence (≈0.97) with small perturbations on many models, and revealing varying robustness (e.g., ViT and AlexNet requiring larger perturbations and longer times). They demonstrate that the method outperforms prior attacks in achieving targeted misclassification with high confidence while preserving image fidelity (high SSIM), and discuss the implications for model robustness and defense strategies. The work provides a publicly available implementation and points to future work on multi-target attacks, grey/black-box settings, and further optimization, highlighting practical impact for evaluating and strengthening image recognition systems.

, with

and

governing adversarial confidence.$

Abstract

Paper Structure (21 sections, 9 equations, 6 figures, 13 tables, 3 algorithms)

This paper contains 21 sections, 9 equations, 6 figures, 13 tables, 3 algorithms.

Introduction
Related Work
White-Box Attacks
Black-Box Attacks
Data Poisoning Attacks
Adversarial Defense
Background and Motivation
Vanilla DeepFool
Targeted DeepFool
Methodology
Enhanced Targeted DeepFool
Experimental Setup
Dataset
Models
Testbed Setups
...and 6 more sections

Figures (6)

Figure 1: Comparison between original DeepFool and our proposed Enhanced Targeted DeepFool. Here, the sample image is taken from the ImageNet dataset and the perturbation image is scaled 20 times for visibility
Figure 2: High-level overview of the Enhanced Targeted DeepFool algorithm. The sample image is taken from the ImageNet dataset.
Figure 3: The 5 randomly chosen images, from the ImageNet dataset, shown in Table \ref{['tab:single_image_perturbations']}, and their respective adversarial examples are illustrated in three rows. The first row represents the original images. The perturbations are shown in the second row, scaled 20 times for better visibility. Finally, the third row presents the perturbed images
Figure 4: Visual comparison with Targeted FGSM using an image, original image from the ImageNet dataset
Figure 5: A few sample images of the ImageNet dataset from our experiments. The perturbed classes are as follows: Traffic light as Manhole cover, School bus as Ambulance, Acoustic guitar as Assault Rifle. Perturbations are shown in the first row, scaled 20 times for better visibility
...and 1 more figures

Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm

TL;DR

Abstract

Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm

Authors

TL;DR

Abstract

Table of Contents

Figures (6)