Table of Contents
Fetching ...

Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

Qi Zhou, Zipeng Ye, Yubo Tang, Wenjian Luo, Yuhui Shi, Yan Jia

TL;DR

This work tackles backdoor vulnerabilities in DNNs by introducing CAM-focus Evolutionary Trigger Filter (CETF) for efficient trigger detection under limited resources, followed by lightweight repair strategies. CETF uses Grad-CAM to constrain the search region and Differential Evolution to precisely locate triggers, enabling accurate filtering of poisoned inputs and extraction of the actual backdoor triggers. The authors propose three repair variants, including Naïve Unlearning and BN-focused approaches (BN-Unlearning and BN-Cleaning), demonstrating that backdoors can be mitigated with minimal data and computation, and revealing a notable link between backdoors and Batch Normalization statistics. Experimental results across multiple models and datasets show CETF’s robustness to varying trigger sizes and counts, outperforming existing defenses in ASR reduction while preserving clean accuracy. These findings offer practical, scalable defense mechanisms and provide new insights into backdoor behavior within BN layers, with potential for broader applicability beyond patch-based attacks.

Abstract

Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.

Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

TL;DR

This work tackles backdoor vulnerabilities in DNNs by introducing CAM-focus Evolutionary Trigger Filter (CETF) for efficient trigger detection under limited resources, followed by lightweight repair strategies. CETF uses Grad-CAM to constrain the search region and Differential Evolution to precisely locate triggers, enabling accurate filtering of poisoned inputs and extraction of the actual backdoor triggers. The authors propose three repair variants, including Naïve Unlearning and BN-focused approaches (BN-Unlearning and BN-Cleaning), demonstrating that backdoors can be mitigated with minimal data and computation, and revealing a notable link between backdoors and Batch Normalization statistics. Experimental results across multiple models and datasets show CETF’s robustness to varying trigger sizes and counts, outperforming existing defenses in ASR reduction while preserving clean accuracy. These findings offer practical, scalable defense mechanisms and provide new insights into backdoor behavior within BN layers, with potential for broader applicability beyond patch-based attacks.

Abstract

Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.
Paper Structure (39 sections, 8 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 39 sections, 8 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: The overview of the backdoor attack. Backdoor is inserted to the model by poisoned images with triggers (yellow patch in the figure) and target labels (label 'go' in the figure) in the training phase. And in the inference phase, when an input is poisoned with the trigger, the model outputs the target label. The performance of classifying clean inputs without triggers is not influenced.
  • Figure 2: Pipeline of our defense based on the CAM-focus evolutionary trigger filter and model repair. The first row of CETF is the process when the input is clean, and the second row is the process when the input is poisoned with a trigger (the British flag). CETF is able to pinpoint the location of the triggers, thus helping to effectively differentiate poisoned samples from clean ones and use them to repair the backdoored model.
  • Figure 3: Clean samples and the corresponding poisoned samples of Input-Aware Attack, Blended Injection, WaNet and SIG.
  • Figure 4: The outputs of each step of CETF. The left shows the results of each step in the process when the inputs are clean, and the right is corresponding to the poisoned inputs. Obviously, CETF can effectively distinguish between the clean inputs and the poisoned inputs, both for different data types and for different trigger types.
  • Figure 5: Optimization process of DE. DE optimizes the region generation by generation to get higher fitness values. A higher fitness value means a more accurate region containing trigger.
  • ...and 4 more figures