Table of Contents
Fetching ...

Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization

Mingda Zhang, Mingli Zhu, Zihao Zhu, Baoyuan Wu

TL;DR

This work proposes a novel approach that can complement existing detection methods and provide new insights about PSD and proposes a novel approach that can complement existing detection methods, which may inspire more in-depth explorations in this field.

Abstract

Backdoor attack has been considered as a serious security threat to deep neural networks (DNNs). Poisoned sample detection (PSD) that aims at filtering out poisoned samples from an untrustworthy training dataset has shown very promising performance for defending against data poisoning based backdoor attacks. However, we observe that the detection performance of many advanced methods is likely to be unstable when facing weak backdoor attacks, such as low poisoning ratio or weak trigger strength. To further verify this observation, we make a statistical investigation among various backdoor attacks and poisoned sample detections, showing a positive correlation between backdoor effect and detection performance. It inspires us to strengthen the backdoor effect to enhance detection performance. Since we cannot achieve that goal via directly manipulating poisoning ratio or trigger strength, we propose to train one model using the Sharpness-Aware Minimization (SAM) algorithm, rather than the vanilla training algorithm. We also provide both empirical and theoretical analysis about how SAM training strengthens the backdoor effect. Then, this SAM trained model can be seamlessly integrated with any off-the-shelf PSD method that extracts discriminative features from the trained model for detection, called SAM-enhanced PSD. Extensive experiments on several benchmark datasets show the reliable detection performance of the proposed method against both weak and strong backdoor attacks, with significant improvements against various attacks ($+34.38\%$ TPR on average), over the conventional PSD methods (i.e., without SAM enhancement). Overall, this work provides new insights about PSD and proposes a novel approach that can complement existing detection methods, which may inspire more in-depth explorations in this field.

Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization

TL;DR

This work proposes a novel approach that can complement existing detection methods and provide new insights about PSD and proposes a novel approach that can complement existing detection methods, which may inspire more in-depth explorations in this field.

Abstract

Backdoor attack has been considered as a serious security threat to deep neural networks (DNNs). Poisoned sample detection (PSD) that aims at filtering out poisoned samples from an untrustworthy training dataset has shown very promising performance for defending against data poisoning based backdoor attacks. However, we observe that the detection performance of many advanced methods is likely to be unstable when facing weak backdoor attacks, such as low poisoning ratio or weak trigger strength. To further verify this observation, we make a statistical investigation among various backdoor attacks and poisoned sample detections, showing a positive correlation between backdoor effect and detection performance. It inspires us to strengthen the backdoor effect to enhance detection performance. Since we cannot achieve that goal via directly manipulating poisoning ratio or trigger strength, we propose to train one model using the Sharpness-Aware Minimization (SAM) algorithm, rather than the vanilla training algorithm. We also provide both empirical and theoretical analysis about how SAM training strengthens the backdoor effect. Then, this SAM trained model can be seamlessly integrated with any off-the-shelf PSD method that extracts discriminative features from the trained model for detection, called SAM-enhanced PSD. Extensive experiments on several benchmark datasets show the reliable detection performance of the proposed method against both weak and strong backdoor attacks, with significant improvements against various attacks ( TPR on average), over the conventional PSD methods (i.e., without SAM enhancement). Overall, this work provides new insights about PSD and proposes a novel approach that can complement existing detection methods, which may inspire more in-depth explorations in this field.

Paper Structure

This paper contains 31 sections, 1 theorem, 2 equations, 9 figures, 3 tables.

Key Result

Proposition 3.1

For any activated neuron in a two-layer ReLU network trained with cross-entropy loss, each update via SAM increases its pre-activation values $\{\left\langle \boldsymbol{w}_j, \tilde{\boldsymbol{x}}\right\rangle\}_{j=1}^m$ with respect to SGD according to a poisoned sample given the condition $a_{j}

Figures (9)

  • Figure 1: T-SNE visualizations for the impact of poisoning ratios and trigger strengths on backdoor attacks. The top row shows backdoor attacks with a higher poisoning ratio (5%) and blending ratio (0.2), while the bottom row shows results of weak attacks with a poisoning ratio of 1% and a blending ratio of 0.1.
  • Figure 2: Comparison between Top-K TAC and AUC across various backdoor attacks and detections. These backdoor attacks are trained on the CIFAR-10 dataset and ResNet18 where $K=2$, including three poisoning ratios: $\{0.5\%, 1\%, 5\%\}$. Distinct shapes and colors denote various detection and attack methods, respectively.
  • Figure 3: The differences in all TACs between the model trained with SAM and the model trained with Vanilla Training. These neurons are indexed in descending order based on the TAC in their respective models, which means that a smaller index indicates that this pair of neurons has a higher TAC in the corresponding model.
  • Figure 4: Comparison of the intra-class feature variance between the model trained with SAM and the model trained with Vanilla Training.
  • Figure 5: Detection performance of base PSD with SAM-enhanced PSD (SAM) under different poisoning ratios on CIFAR10 and ResNet18.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • Remark