Fisher Information guided Purification against Backdoor Attacks

Nazmul Karim; Abdullah Al Arafat; Adnan Siraj Rakin; Zhishan Guo; Nazanin Rahnavard

Fisher Information guided Purification against Backdoor Attacks

Nazmul Karim, Abdullah Al Arafat, Adnan Siraj Rakin, Zhishan Guo, Nazanin Rahnavard

TL;DR

An efficient variant of FIP is introduced, dubbed as Fast FIP, which reduces the number of tunable parameters significantly and obtains an impressive runtime gain of almost 5×, and achieves state-of-the-art (SOTA) performance on a wide range of backdoor defense benchmarks.

Abstract

Studies on backdoor attacks in recent years suggest that an adversary can compromise the integrity of a deep neural network (DNN) by manipulating a small set of training samples. Our analysis shows that such manipulation can make the backdoor model converge to a bad local minima, i.e., sharper minima as compared to a benign model. Intuitively, the backdoor can be purified by re-optimizing the model to smoother minima. However, a naïve adoption of any optimization targeting smoother minima can lead to sub-optimal purification techniques hampering the clean test accuracy. Hence, to effectively obtain such re-optimization, inspired by our novel perspective establishing the connection between backdoor removal and loss smoothness, we propose Fisher Information guided Purification (FIP), a novel backdoor purification framework. Proposed FIP consists of a couple of novel regularizers that aid the model in suppressing the backdoor effects and retaining the acquired knowledge of clean data distribution throughout the backdoor removal procedure through exploiting the knowledge of Fisher Information Matrix (FIM). In addition, we introduce an efficient variant of FIP, dubbed as Fast FIP, which reduces the number of tunable parameters significantly and obtains an impressive runtime gain of almost $5\times$. Extensive experiments show that the proposed method achieves state-of-the-art (SOTA) performance on a wide range of backdoor defense benchmarks: 5 different tasks -- Image Recognition, Object Detection, Video Action Recognition, 3D point Cloud, Language Generation; 11 different datasets including ImageNet, PASCAL VOC, UCF101; diverse model architectures spanning both CNN and vision transformer; 14 different backdoor attacks, e.g., Dynamic, WaNet, LIRA, ISSBA, etc.

Fisher Information guided Purification against Backdoor Attacks

TL;DR

Abstract

. Extensive experiments show that the proposed method achieves state-of-the-art (SOTA) performance on a wide range of backdoor defense benchmarks: 5 different tasks -- Image Recognition, Object Detection, Video Action Recognition, 3D point Cloud, Language Generation; 11 different datasets including ImageNet, PASCAL VOC, UCF101; diverse model architectures spanning both CNN and vision transformer; 14 different backdoor attacks, e.g., Dynamic, WaNet, LIRA, ISSBA, etc.

Paper Structure (35 sections, 2 theorems, 17 equations, 7 figures, 17 tables)

This paper contains 35 sections, 2 theorems, 17 equations, 7 figures, 17 tables.

Introduction
Related Work
Threat Model
Smoothness Analysis of Backdoor Models
Fisher Information guided Purification (FIP)
Fast FIP (f-FIP)
Experimental Results
Evaluation Settings
Performance Evaluation of FIP
Image Classification
Video Action Recognition
3D Point Cloud
Natural Language Generation (NLG) Task
Ablation Study
Smoothness Analysis of FIP
...and 20 more sections

Key Result

Theorem 1

If the gradient of loss corresponding to clean and poison samples are $L_c-$Lipschitz and $L_b-$Lipschitz, respectively, then the overall loss (i.e., loss corresponding to training samples with their ground-truth labels) of backdoor model is $L_b-$Smooth and $L_c < L_b$.

Figures (7)

Figure 1: a & b) Eigen spectral density plots of loss Hessian for benign and backdoor (TrojanNet liu2017trojaning) models. In each plot, the maximum eigenvalue ($\lambda_\mathsf{max}$), the trace of Hessian ($\mathsf{Tr}(H)$), clean test accuracy (ACC), and attack success rate (ASR) are also reported. Here, low $\lambda_\mathsf{max}$ and $\mathsf{Tr}(H)$ hints at the presence of a smoother loss surface, which often results in low ASR and high ACC. Compared to a benign model, a backdoor model tends to reach sharper minima, as shown by the larger range of eigenvalues (x-axis). c) The convergence phenomena over the course of training. As the backdoor model converges to sharper minima, d) both ASR and ACC increase (around 80 epochs). We use the CIFAR10 dataset with a PreActResNet18 he2016identity architecture for all evaluations.
Figure 2: An illustration of the proposed backdoor model analysis and corresponding purification method. In Figure \ref{['fig:method_illustration']}a, we assume a standard backdoor insertion scenario where the attacker has full control over the training process. Figure \ref{['fig:method_illustration']}b illustrates our observation following the smoothness analysis of a pre-trained model. Figure \ref{['fig:method_illustration']}c shows that a model purified via the proposed method FIP is immune to backdoor trigger and can predict true label in the presence of a backdoor trigger. Note, figures to illustrate loss surface (in Figure \ref{['fig:method_illustration']}b) are taken from foret2021sharpnessaware.
Figure 3: Smoothness analysis of a DNN during backdoor purification processes. As the model is being re-optimized to smooth minima, the effect of the backdoor vanishes. We use CIFAR10 dataset for this experiment.
Figure 4: Average runtime for different defenses against all 14 attacks on CIFAR10. An NVIDIA RTX3090 GPU was used for this evaluation.
Figure 5: t-SNE visualization of class features for CIFAR10 dataset with Badnets attack. For visualization purposes only, we assign label "0" to clean data cluster from the target class and the label "11" to poison data cluster. However, both of these clusters have the same training label "0" during training. It can be observed that FIP can successfully remove the backdoor effect and reassign the samples from the poison data cluster to their original class cluster. After purification, poison data are distributed among their original ground truth classes instead of the target class. To estimate these clusters, we take the feature embedding out of the backbone.
...and 2 more figures

Theorems & Definitions (6)

Definition 1
Definition 2
Theorem 1
Lemma 1
proof
proof

Fisher Information guided Purification against Backdoor Attacks

TL;DR

Abstract

Fisher Information guided Purification against Backdoor Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)