Data Free Backdoor Attacks

Bochuan Cao; Jinyuan Jia; Chuxuan Hu; Wenbo Guo; Zhen Xiang; Jinghui Chen; Bo Li; Dawn Song

Data Free Backdoor Attacks

Bochuan Cao, Jinyuan Jia, Chuxuan Hu, Wenbo Guo, Zhen Xiang, Jinghui Chen, Bo Li, Dawn Song

TL;DR

DFBA addresses the practicality gap in backdoor attacks by delivering a retraining-free, data-free method that injects a backdoor without altering model architecture. It creates a concise backdoor path by selecting one neuron per layer and optimizes a trigger $\bm{\delta}$ along with an amplification factor $\gamma$ to ensure backdoored inputs yield the attacker’s target class $y_{tc}$ while keeping clean accuracy largely intact. The authors provide a theoretical framework showing the backdoor is undetectable and irremovable under common defenses, and empirically validate 100% attack success with minimal CA loss across diverse datasets and architectures, while bypassing six defenses. The work highlights a significant risk in model-sharing platforms where pre-trained classifiers can be backdoored without access to training data or architecture modifications, motivating stronger testing-phase defenses and supply-chain safeguards.

Abstract

Backdoor attacks aim to inject a backdoor into a classifier such that it predicts any input with an attacker-chosen backdoor trigger as an attacker-chosen target class. Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture. As a result, they are 1) not applicable when clean data is unavailable, 2) less efficient when the model is large, and 3) less stealthy due to architecture changes. In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. Technically, our proposed method modifies a few parameters of a classifier to inject a backdoor. Through theoretical analysis, we verify that our injected backdoor is provably undetectable and unremovable by various state-of-the-art defenses under mild assumptions. Our evaluation on multiple datasets further demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses. Moreover, our comparison with a state-of-the-art non-data-free backdoor attack shows our attack is more stealthy and effective against various defenses while achieving less classification accuracy loss.

Data Free Backdoor Attacks

TL;DR

Abstract

Data Free Backdoor Attacks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)