BadSAD: Clean-Label Backdoor Attacks against Deep Semi-Supervised Anomaly Detection
He Cheng, Depeng Xu, Shuhan Yuan
TL;DR
This work introduces BadSAD, a clean-label backdoor framework that targets DeepSAD for image anomaly detection by injecting triggers into normal images and performing latent-space manipulations to align poisoned samples with normal data while keeping abnormal samples distinct. By shaping latent-space distributions through distribution alignment and concentration, BadSAD enables triggered abnormal images to be misclassified as normal without compromising clean anomaly detection performance, demonstrated via AUC and ASR on MNIST, CIFAR-10, and Fashion-MNIST. The approach is validated against multiple baselines and shows robust attack performance under various triggers and threshold settings, highlighting the practical risk when outsourcing model training. The findings underscore the need for defense mechanisms against backdoors in anomaly detection, especially in high-stakes domains like industrial inspection, medical imaging, and security. $\,$
Abstract
Image anomaly detection (IAD) is essential in applications such as industrial inspection, medical imaging, and security. Despite the progress achieved with deep learning models like Deep Semi-Supervised Anomaly Detection (DeepSAD), these models remain susceptible to backdoor attacks, presenting significant security challenges. In this paper, we introduce BadSAD, a novel backdoor attack framework specifically designed to target DeepSAD models. Our approach involves two key phases: trigger injection, where subtle triggers are embedded into normal images, and latent space manipulation, which positions and clusters the poisoned images near normal images to make the triggers appear benign. Extensive experiments on benchmark datasets validate the effectiveness of our attack strategy, highlighting the severe risks that backdoor attacks pose to deep learning-based anomaly detection systems.
