Stealthy Backdoor Attack via Confidence-driven Sampling
Pengfei He, Yue Xing, Han Xu, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou
TL;DR
This work addresses stealth in backdoor attacks by demonstrating that random poisoning sampling is detectable and proposing Confidence-driven Boundary Sampling (CBS) to select boundary-adjacent, low-confidence samples using a surrogate model. CBS is designed to be compatible with a wide range of trigger designs and backdoor techniques, and it is supported by theoretical (SVM-based) and empirical analyses showing reduced detectability and maintained attack efficacy under defenses. Across CIFAR-10/100, Tiny ImageNet, and ImageNet-1k, CBS consistently improves resilience against both detection-based and non-detection-based defenses, with clear tradeoffs controlled by the boundary-uncertainty parameter $oldsymbol{ extepsilon}$ and poisoning rate. The results imply that sampling strategy is a crucial lever in backdoor stealth and that CBS can adapt to real-world constraints, including partial data access, while highlighting opportunities to balance effectiveness and stealth in future defenses. Overall, the paper delivers a novel, versatile sampling mechanism that enhances backdoor stealthiness and motivates more robust defense strategies against boundary-adjacent poisoning.
Abstract
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.
