Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Nazmul Karim; Abdullah Al Arafat; Umar Khalid; Zhishan Guo; Nazanin Rahnavard

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Nazanin Rahnavard

TL;DR

This work tackles the vulnerability of DNNs to backdoor attacks and the inefficiency of existing defenses. It introduces augmented neural fine-tuning (NFT), a test-time purification framework that replaces expensive trigger synthesis with MixUp-based augmentation and learns soft neural masks to suppress backdoors while preserving clean accuracy. The authors provide theoretical justification (L^{mix} upper-bounding L^{ideal} under certain conditions) and a practical, mask-based optimization with a scheduling function and regularizer to limit drift, enabling efficient and sample-efficient purification, including one-shot scenarios. Empirically, NFT achieves state-of-the-art purification across vision, video, 3D point cloud, and NLP tasks, with strong performance under diverse attacks and scalable to large datasets, while maintaining favorable runtime characteristics.

Abstract

Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks, where the behavior of DNNs can be compromised by utilizing certain types of triggers or poisoning mechanisms. State-of-the-art (SOTA) defenses employ too-sophisticated mechanisms that require either a computationally expensive adversarial search module for reverse-engineering the trigger distribution or an over-sensitive hyper-parameter selection module. Moreover, they offer sub-par performance in challenging scenarios, e.g., limited validation data and strong attacks. In this paper, we propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities in a way that the effect of the backdoor is removed. Utilizing a simple data augmentation like MixUp, NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module. Our study further reveals that direct weight fine-tuning under limited validation data results in poor post-purification clean test accuracy, primarily due to overfitting issue. To overcome this, we propose to fine-tune neural masks instead of model weights. In addition, a mask regularizer has been devised to further mitigate the model drift during the purification process. The distinct characteristics of NFT render it highly efficient in both runtime and sample usage, as it can remove the backdoor even when a single sample is available from each class. We validate the effectiveness of NFT through extensive experiments covering the tasks of image classification, object detection, video action recognition, 3D point cloud, and natural language processing. We evaluate our method against 14 different attacks (LIRA, WaNet, etc.) on 11 benchmark data sets such as ImageNet, UCF101, Pascal VOC, ModelNet, OpenSubtitles2012, etc.

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 21 equations, 5 figures, 26 tables)

This paper contains 23 sections, 2 theorems, 21 equations, 5 figures, 26 tables.

Introduction
Related Work
Threat Model
Neural Fine-Tuning (NFT)
Backdoor Suppressor
Clean Accuracy Retainer
Sample Efficiency of NFT
Experimental Results
Evaluation Settings
Performance Comparison of NFT
Ablation Study
Conclusion
Theoretical Justifications.
Experimental Settings
Attack Implementation Details
...and 8 more sections

Key Result

lemma thmcounterlemma

Assuming $f_\theta(x) = \nabla f_\theta(x)^T x$ and $\nabla^2 f_\theta(x) = 0$ (which are satisfied by ReLU and max-pooling activation functions), $\mathcal{L}^{\mathrm{mix}}(\theta,\mathbb{D}_\mathrm{val} )$ can be expressed as, where, where $R=\min_{i\in [N_\mathrm{val}]}\langle\nabla f_\theta(x_i), x_i\rangle/||\nabla f_\theta(x_i)||\cdot||x_i||$ and $c_x > 0$ is a constant.

Figures (5)

Figure 1: t-SNE visualization of a backdoor model, where we show the "poison cluster" with red color and label "11". Since the attack target class is "0", cluster "0" and the poison cluster sit closely with each other (Fig. \ref{['fig:tsne_backdoor']}). After purification, the cluster should break, and all triggered samples should be classified according to their original label. In Fig. \ref{['fig:wo_reg']}, we perform one-shot NFT without employing the regularizer. Due to the overfitting issue, the clean clusters lose their separability that can be established with Mask regularizer, which tackles this issue (larger cluster gaps as compared to scenarios in Fig. \ref{['fig:wo_reg']}) by keeping purified model parameters close to the original backdoor model (Fig. \ref{['fig:with_reg']}). This, in turn, produces better clean test accuracy. For evaluation, we train a PreActResNet18 he2016identity on CIFAR10 dataset with a poison rate of 10%.
Figure 2: Ablation with different Mask Scheduling Function ($\mu$).
Figure 3: Mask Distribution of AWM (left) and NFT (right).
Figure 4: Illustration of Mask Heatmap with and without scheduling function ($\mu$). This ablation is done for the LIRA attack and CIFAR10 dataset. In both cases, we do not use the mask regularizer here just to show the impact of the $\mu$. The first couple of layers have minimal changes.
Figure 5: Illustration of Mask Heatmap with and without regularizer. This ablation is done for the Badnets attack and CIFAR10 dataset. In both cases, we do not use the mask scheduling function here just to show the impact of the regularizer. With the mask regularizer, we restrict the weights to be closer to the original backdoor model (shown by the overall larger yellow region).

Theorems & Definitions (4)

lemma thmcounterlemma
theorem thmcountertheorem
proof
proof

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

TL;DR

Abstract

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)