Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder

Xiangqi Wang; Mingfu Xue; Kewei Chen; Jing Xu; Wenmao Liu; Leo Yu Zhang; Yushu Zhang

Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder

Xiangqi Wang, Mingfu Xue, Kewei Chen, Jing Xu, Wenmao Liu, Leo Yu Zhang, Yushu Zhang

TL;DR

This paper tackles the security threat of backdoors in deep neural networks by introducing imperceptible, sample-specific triggers generated via a denoising autoencoder. Unlike traditional universal triggers, these triggers vary per input and remain visually indistinguishable, enabling high attack success with minimal impact on clean accuracy and strong transferability across tasks. The authors demonstrate up to 99.8% attack success on ImageNet and MS-Celeb-1M with as little as 1% poisoned data, while evading mainstream defenses such as Neural Cleanse, STRIP, SentiNet, and Fine-Pruning. The work highlights a practical risk for outsourced data pipelines and emphasizes the need for defenses that can detect dynamic, imperceptible backdoors across diverse tasks and datasets.

Abstract

The backdoor attack poses a new security threat to deep neural networks. Existing backdoor often relies on visible universal trigger to make the backdoored model malfunction, which are not only usually visually suspicious to human but also catchable by mainstream countermeasures. We propose an imperceptible sample-specific backdoor that the trigger varies from sample to sample and invisible. Our trigger generation is automated through a desnoising autoencoder that is fed with delicate but pervasive features (i.e., edge patterns per images). We extensively experiment our backdoor attack on ImageNet and MS-Celeb-1M, which demonstrates stable and nearly 100% (i.e., 99.8%) attack success rate with negligible impact on the clean data accuracy of the infected model. The denoising autoeconder based trigger generator is reusable or transferable across tasks (e.g., from ImageNet to MS-Celeb-1M), whilst the trigger has high exclusiveness (i.e., a trigger generated for one sample is not applicable to another sample). Besides, our proposed backdoored model has achieved high evasiveness against mainstream backdoor defenses such as Neural Cleanse, STRIP, SentiNet and Fine-Pruning.

Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder

TL;DR

Abstract

Paper Structure (14 sections, 4 equations, 8 figures, 7 tables)

This paper contains 14 sections, 4 equations, 8 figures, 7 tables.

Introduction
Related Work
Backdoor Attacks
Backdoor Defenses
Imperceptible Sample-Specific Trigger with Denoising Autoencoder
Pipeline of Backdoor Attacks
Threat Model
Our Proposed Attack
Experimental Analysis
Experimental Setup
Attack Performance
Evasiveness Against Backdoor Defenses
Ablation Studies
Conclusion

Figures (8)

Figure 1: Comparison between backdoor examples generated by our method and existing backdoor attacks.
Figure 2: Examples of three feature injection modes and images after feature injection and denoising autoencoder reconstruction.
Figure 3: The training process of the denoising autoencoder.
Figure 4: Attack pipeline.
Figure 5: The effect of different attack methods against Fine-Pruning defense method.
...and 3 more figures

Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder

TL;DR

Abstract

Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder

Authors

TL;DR

Abstract

Table of Contents

Figures (8)