Table of Contents
Fetching ...

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Rui Min, Zeyu Qin, Nevin L. Zhang, Li Shen, Minhao Cheng

TL;DR

A straightforward tuning defense, Path-Aware Minimization (PAM), is proposed, which promotes deviation along backdoor-connected paths with extra model updates, which significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR.

Abstract

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. Typically, these purified models exhibit low Attack Success Rates (ASR), rendering them resistant to backdoored inputs. However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using a very small number of poisoned samples. Based on this, we further propose the practical Query-based Reactivation Attack (QRA) which could effectively reactivate the backdoor by merely querying purified models. We find the failure to achieve satisfactory post-purification robustness stems from the insufficient deviation of purified models from the backdoored model along the backdoor-connected path. To improve the post-purification robustness, we propose a straightforward tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates. Extensive experiments demonstrate that PAM significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR. Our work provides a new perspective on understanding the effectiveness of backdoor safety tuning and highlights the importance of faithfully assessing the model's safety.

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

TL;DR

A straightforward tuning defense, Path-Aware Minimization (PAM), is proposed, which promotes deviation along backdoor-connected paths with extra model updates, which significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR.

Abstract

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. Typically, these purified models exhibit low Attack Success Rates (ASR), rendering them resistant to backdoored inputs. However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using a very small number of poisoned samples. Based on this, we further propose the practical Query-based Reactivation Attack (QRA) which could effectively reactivate the backdoor by merely querying purified models. We find the failure to achieve satisfactory post-purification robustness stems from the insufficient deviation of purified models from the backdoored model along the backdoor-connected path. To improve the post-purification robustness, we propose a straightforward tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates. Extensive experiments demonstrate that PAM significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR. Our work provides a new perspective on understanding the effectiveness of backdoor safety tuning and highlights the importance of faithfully assessing the model's safety.

Paper Structure

This paper contains 32 sections, 3 equations, 17 figures, 11 tables, 1 algorithm.

Figures (17)

  • Figure 1: The robustness performance against various attack settings. The title consists of the used dataset, model, and poisoning rate. The O-ASR metric represents the defense performance of original defense methods, while the P-ASR metric indicates the ASR after applying the RA. All metrics are measured in percentage (%) Here we report the average results among backdoor attacks and defer more details in Appendix\ref{['app:detailed_ra']}.
  • Figure 2: Experimental results of QRA on both the purified and clean models against four types of backdoor attacks. We evaluate the QRA on CIFAR-10 with ResNet-18 and the poisoning rate is set to $5\%$. Additional results of QRA are demonstrated in Appendix\ref{['app:detailed_qra']}.
  • Figure 3: The results of the QRA transferability. The defense method used in the attack is represented on the $x$-axis, while the $y$-axis shows the average P-ASR across other purifications.
  • Figure 4: The evaluation of backdoor-connected path against various attack settings. The x-axis and y-axis denote the interpolation ratio $t$ and backdoor error (1-ASR) respectively. For each attack setting, we report the average results among backdoor attacks.
  • Figure 5: The LMC path connected from other defense techniques to EP. We evaluate the LMC results on CIFAR-10 with ResNet-18, and set the poisoning rate to $5\%$.
  • ...and 12 more figures