Adversarial Example Defense via Perturbation Grading Strategy

Shaowei Zhu; Wanli Lyu; Bin Li; Zhaoxia Yin; Bin Luo

Adversarial Example Defense via Perturbation Grading Strategy

Shaowei Zhu, Wanli Lyu, Bin Li, Zhaoxia Yin, Bin Luo

TL;DR

This work tackles adversarial vulnerability of DNNs in practical tasks and the limited applicability of existing defenses that often require retraining or heavy preprocessing. It introduces FADDefend, a perturbation_grading preprocessing framework that uses a blind perturbation level estimator to classify inputs as small or large perturbations with a threshold of $2.13$, routing them to different defenses without modifying the classifier. Small perturbations are mitigated via JPEG compression with a high quality factor $QF=95$ and a mirror flip, while large perturbations are reconstructed by a DIP-based untrained network before applying the same JPEG+flip processing. Evaluations on ImageNet against multiple attack types and cross-model transfers show improved defense accuracy and reduced computation relative to fully reconstructive baselines, confirming practical deployment advantages.

Abstract

Deep Neural Networks have been widely used in many fields. However, studies have shown that DNNs are easily attacked by adversarial examples, which have tiny perturbations and greatly mislead the correct judgment of DNNs. Furthermore, even if malicious attackers cannot obtain all the underlying model parameters, they can use adversarial examples to attack various DNN-based task systems. Researchers have proposed various defense methods to protect DNNs, such as reducing the aggressiveness of adversarial examples by preprocessing or improving the robustness of the model by adding modules. However, some defense methods are only effective for small-scale examples or small perturbations but have limited defense effects for adversarial examples with large perturbations. This paper assigns different defense strategies to adversarial perturbations of different strengths by grading the perturbations on the input examples. Experimental results show that the proposed method effectively improves defense performance. In addition, the proposed method does not modify any task model, which can be used as a preprocessing module, which significantly reduces the deployment cost in practical applications.

Adversarial Example Defense via Perturbation Grading Strategy

TL;DR

, routing them to different defenses without modifying the classifier. Small perturbations are mitigated via JPEG compression with a high quality factor

and a mirror flip, while large perturbations are reconstructed by a DIP-based untrained network before applying the same JPEG+flip processing. Evaluations on ImageNet against multiple attack types and cross-model transfers show improved defense accuracy and reduced computation relative to fully reconstructive baselines, confirming practical deployment advantages.

Abstract

Paper Structure (16 sections, 1 equation, 7 figures, 3 tables)

This paper contains 16 sections, 1 equation, 7 figures, 3 tables.

Introduction
Related Works
Adversarial Attack Methods
White-box Attacks
Black-box Attacks
Adversarial Defense Methods
Proposed Method
Image perturbation level evaluation
Defense methods based on image processing technology
Defense methods based on deep image priors
Experiments
Experimental Settings
Comparison with Image Compression Defenses
Results on Attacks of Different Types and Strengths
Results on Migration Attacks under Different Models
...and 1 more sections

Figures (7)

Figure 1: An example of the proposed FADefend. It removes the perturbations in the adversarial examples before feeding them into the classifier.
Figure 2: FADDefend defense framework.
Figure 3: Different thresholds can be chosen by the intersection of expected accuracy and adversarial example defense accuracy.
Figure 4: (a) original example; (d) corresponding adversarial example. (b)(c)(e)(f) used class activation mapping of the images. (b) original example-bus; (c) fliped original example-bus; (e) adversarial example-bike; (f) fliped adversarial example-bus. The redder the class activation mapping of the image, the more the model pays attention to this area.
Figure 5: Comparison of defense effects of preprocessing methods under different QFs. (a) Defense accuracy of various preprocessing methods under FGSM (2/255), (b) Defense accuracy using JPEG compression combined with mirror flip under different perturbations.
...and 2 more figures

Adversarial Example Defense via Perturbation Grading Strategy

TL;DR

Abstract

Adversarial Example Defense via Perturbation Grading Strategy

Authors

TL;DR

Abstract

Table of Contents

Figures (7)