Table of Contents
Fetching ...

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

Binyan Xu, Fan Yang, Di Tang, Xilin Dai, Kehuan Zhang

TL;DR

This work addresses clean-image backdoors, where labels are manipulated without altering images, by breaking the traditional stealth-potency trade-off. It introduces Generative Clean-Image Backdoors (GCB) and a conditional InfoGAN (C-InfoGAN) that identifies trigger features existing in benign data, ensuring separability from non-trigger features and irrelevancy to the task. Through a three-stage pipeline—attack preparation, poisoning, and inference—GCB achieves high attack success rates (ASR) with negligible clean-accuracy (CA) drops at very low poison rates (≤1%), and generalizes across six datasets, five architectures, and four tasks (including regression and segmentation). The method demonstrates robustness against many defenses, highlights limitations of existing defenses, and discusses ethical considerations and mitigation strategies for data-labeling supply chains. Overall, GCB expands the threat surface of data-centric backdoors while underscoring the need for stronger, defense-driven detection and provenance mechanisms in ML pipelines.

Abstract

Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

TL;DR

This work addresses clean-image backdoors, where labels are manipulated without altering images, by breaking the traditional stealth-potency trade-off. It introduces Generative Clean-Image Backdoors (GCB) and a conditional InfoGAN (C-InfoGAN) that identifies trigger features existing in benign data, ensuring separability from non-trigger features and irrelevancy to the task. Through a three-stage pipeline—attack preparation, poisoning, and inference—GCB achieves high attack success rates (ASR) with negligible clean-accuracy (CA) drops at very low poison rates (≤1%), and generalizes across six datasets, five architectures, and four tasks (including regression and segmentation). The method demonstrates robustness against many defenses, highlights limitations of existing defenses, and discusses ethical considerations and mitigation strategies for data-labeling supply chains. Overall, GCB expands the threat surface of data-centric backdoors while underscoring the need for stronger, defense-driven detection and provenance mechanisms in ML pipelines.

Abstract

Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.

Paper Structure

This paper contains 62 sections, 11 equations, 56 figures, 15 tables.

Figures (56)

  • Figure 1: Breaking the Stealth-Potency Trade-off. Average Attack Success Rate (ASR) vs. Clean Accuracy (CA) drop across all datasets. Baselines must sacrifice stealth (CA drop) for attack success. In contrast, our method (GCB, $\bigstar$) delivers a highly effective attack with negligible CA drop.
  • Figure 2: Framework of Generative Adversarial Clean-Image Backdoors (GCB). In the preparation stage, a specific clean feature (e.g., background color here) is extracted as a backdoor trigger.
  • Figure 3: Stealth-potency trade-off of clean-image backdoor methods across datasets. Marker size and text indicate poison rates on each point. Our method, GCB, achieves $\geq90\%$ attack success with $\leq1\%$ drop in clean accuracy.
  • Figure 4: GCB's test ASR on CIFAR-10 converges fast, but its training ASR lags, resisting fast-learning defenses like ABL.
  • Figure 6: Difference from clean images. Closeness to "Clean" values indicates stealthiness.
  • ...and 51 more figures