Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization
Binyan Xu, Fan Yang, Di Tang, Xilin Dai, Kehuan Zhang
TL;DR
This work addresses clean-image backdoors, where labels are manipulated without altering images, by breaking the traditional stealth-potency trade-off. It introduces Generative Clean-Image Backdoors (GCB) and a conditional InfoGAN (C-InfoGAN) that identifies trigger features existing in benign data, ensuring separability from non-trigger features and irrelevancy to the task. Through a three-stage pipeline—attack preparation, poisoning, and inference—GCB achieves high attack success rates (ASR) with negligible clean-accuracy (CA) drops at very low poison rates (≤1%), and generalizes across six datasets, five architectures, and four tasks (including regression and segmentation). The method demonstrates robustness against many defenses, highlights limitations of existing defenses, and discusses ethical considerations and mitigation strategies for data-labeling supply chains. Overall, GCB expands the threat surface of data-centric backdoors while underscoring the need for stronger, defense-driven detection and provenance mechanisms in ML pipelines.
Abstract
Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.
