Imperceptible Sample-Specific Backdoor to DNN with Denoising Autoencoder
Xiangqi Wang, Mingfu Xue, Kewei Chen, Jing Xu, Wenmao Liu, Leo Yu Zhang, Yushu Zhang
TL;DR
This paper tackles the security threat of backdoors in deep neural networks by introducing imperceptible, sample-specific triggers generated via a denoising autoencoder. Unlike traditional universal triggers, these triggers vary per input and remain visually indistinguishable, enabling high attack success with minimal impact on clean accuracy and strong transferability across tasks. The authors demonstrate up to 99.8% attack success on ImageNet and MS-Celeb-1M with as little as 1% poisoned data, while evading mainstream defenses such as Neural Cleanse, STRIP, SentiNet, and Fine-Pruning. The work highlights a practical risk for outsourced data pipelines and emphasizes the need for defenses that can detect dynamic, imperceptible backdoors across diverse tasks and datasets.
Abstract
The backdoor attack poses a new security threat to deep neural networks. Existing backdoor often relies on visible universal trigger to make the backdoored model malfunction, which are not only usually visually suspicious to human but also catchable by mainstream countermeasures. We propose an imperceptible sample-specific backdoor that the trigger varies from sample to sample and invisible. Our trigger generation is automated through a desnoising autoencoder that is fed with delicate but pervasive features (i.e., edge patterns per images). We extensively experiment our backdoor attack on ImageNet and MS-Celeb-1M, which demonstrates stable and nearly 100% (i.e., 99.8%) attack success rate with negligible impact on the clean data accuracy of the infected model. The denoising autoeconder based trigger generator is reusable or transferable across tasks (e.g., from ImageNet to MS-Celeb-1M), whilst the trigger has high exclusiveness (i.e., a trigger generated for one sample is not applicable to another sample). Besides, our proposed backdoored model has achieved high evasiveness against mainstream backdoor defenses such as Neural Cleanse, STRIP, SentiNet and Fine-Pruning.
