Table of Contents
Fetching ...

Data Free Backdoor Attacks

Bochuan Cao, Jinyuan Jia, Chuxuan Hu, Wenbo Guo, Zhen Xiang, Jinghui Chen, Bo Li, Dawn Song

TL;DR

DFBA addresses the practicality gap in backdoor attacks by delivering a retraining-free, data-free method that injects a backdoor without altering model architecture. It creates a concise backdoor path by selecting one neuron per layer and optimizes a trigger $\bm{\delta}$ along with an amplification factor $\gamma$ to ensure backdoored inputs yield the attacker’s target class $y_{tc}$ while keeping clean accuracy largely intact. The authors provide a theoretical framework showing the backdoor is undetectable and irremovable under common defenses, and empirically validate 100% attack success with minimal CA loss across diverse datasets and architectures, while bypassing six defenses. The work highlights a significant risk in model-sharing platforms where pre-trained classifiers can be backdoored without access to training data or architecture modifications, motivating stronger testing-phase defenses and supply-chain safeguards.

Abstract

Backdoor attacks aim to inject a backdoor into a classifier such that it predicts any input with an attacker-chosen backdoor trigger as an attacker-chosen target class. Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture. As a result, they are 1) not applicable when clean data is unavailable, 2) less efficient when the model is large, and 3) less stealthy due to architecture changes. In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. Technically, our proposed method modifies a few parameters of a classifier to inject a backdoor. Through theoretical analysis, we verify that our injected backdoor is provably undetectable and unremovable by various state-of-the-art defenses under mild assumptions. Our evaluation on multiple datasets further demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses. Moreover, our comparison with a state-of-the-art non-data-free backdoor attack shows our attack is more stealthy and effective against various defenses while achieving less classification accuracy loss.

Data Free Backdoor Attacks

TL;DR

DFBA addresses the practicality gap in backdoor attacks by delivering a retraining-free, data-free method that injects a backdoor without altering model architecture. It creates a concise backdoor path by selecting one neuron per layer and optimizes a trigger along with an amplification factor to ensure backdoored inputs yield the attacker’s target class while keeping clean accuracy largely intact. The authors provide a theoretical framework showing the backdoor is undetectable and irremovable under common defenses, and empirically validate 100% attack success with minimal CA loss across diverse datasets and architectures, while bypassing six defenses. The work highlights a significant risk in model-sharing platforms where pre-trained classifiers can be backdoored without access to training data or architecture modifications, motivating stronger testing-phase defenses and supply-chain safeguards.

Abstract

Backdoor attacks aim to inject a backdoor into a classifier such that it predicts any input with an attacker-chosen backdoor trigger as an attacker-chosen target class. Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture. As a result, they are 1) not applicable when clean data is unavailable, 2) less efficient when the model is large, and 3) less stealthy due to architecture changes. In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. Technically, our proposed method modifies a few parameters of a classifier to inject a backdoor. Through theoretical analysis, we verify that our injected backdoor is provably undetectable and unremovable by various state-of-the-art defenses under mild assumptions. Our evaluation on multiple datasets further demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses. Moreover, our comparison with a state-of-the-art non-data-free backdoor attack shows our attack is more stealthy and effective against various defenses while achieving less classification accuracy loss.

Paper Structure

This paper contains 36 sections, 5 theorems, 12 equations, 8 figures, 6 tables.

Key Result

Lemma 1

Suppose $\delta_n$ ($n \in \Gamma(\mathbf{m})$) is optimized as in Equation delta_analytical_solution. Given an arbitrary clean input $\mathbf{x}$, $\mathbf{x}$ cannot activate $s_1$ if the following condition is satisfied:

Figures (8)

  • Figure 1: An example of the backdoor switch and optimized trigger when each pixel of an image is normalized to the range $[0,1]$.
  • Figure 2: Visualization of our backdoor path when it is activated by a backdoored input. The backdoored model will predict the target class for the backdoored input.
  • Figure 3: Comparing DFBA with Hong et al. hong2021handcrafted under fine-tuning.
  • Figure 4: Comparing DFBA with Hong et al. hong2021handcrafted under pruning liu2018fine-pruning.
  • Figure 5: Comparing our DFBA with Hong et al. hong2021handcrafted under fine-tuning after pruning neurons on MNIST.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Lemma 1
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • proof : Proof of Lemma 1
  • Example 1
  • proof
  • proof
  • proof : Proof of Proposition 1
  • ...and 2 more