Table of Contents
Fetching ...

Backdoors in Conditional Diffusion: Threats to Responsible Synthetic Data Pipelines

Raz Lapid, Almog Dubin

TL;DR

The paper reveals a novel backdoor surface in ControlNet-conditioned diffusion pipelines by showing that poisoning a small fraction of the fine-tuning data (as little as 1–5%) can implant a covert trigger in the conditioning pathway that causes attacker-chosen content to appear under a visual trigger, while remaining inconspicuous on benign inputs. It formalizes a threat model where only the ControlNet branch is trained and demonstrates the construction and training objective that binds the trigger to a malicious target through a poisoned dataset. The authors validate the attack across multiple backbones and datasets, perform ablations on trigger strength and conditioning guidance, and introduce clean fine-tuning (CFT) as a practical defense that substantially reduces attack success in homogeneous domains, highlighting a critical supply-chain risk in open-source, conditioned diffusion systems. They advocate for provenance, backdoor-probing tests, and sanitization workflows to improve the safety and trustworthiness of synthetic-data pipelines. Overall, the work emphasizes that safeguarding conditional diffusion is inseparable from ensuring the integrity of auxiliary components like ControlNets in responsible AI deployment.

Abstract

Text-to-image diffusion models achieve high-fidelity image generation from natural language prompts. ControlNets extend these models by enabling conditioning on structural inputs (e.g., edge maps, depth, pose), providing fine-grained control over outputs. Yet their reliance on large, publicly scraped datasets and community fine-tuning makes them vulnerable to data poisoning. We introduce a model-poisoning attack that embeds a covert backdoor into a ControlNet, causing it to produce attacker-specified content when exposed to visual triggers, without textual prompts. Experiments show that poisoning only 1% of the fine-tuning corpus yields a 90-98% attack success rate, while 5% further strengthens the backdoor, all while preserving normal generation quality. To mitigate this risk, we propose clean fine-tuning (CFT): freezing the diffusion backbone and fine-tuning only the ControlNet on a sanitized dataset with a reduced learning rate. CFT lowers attack success rates on held-out data. These results expose a critical security weakness in open-source, ControlNet-guided diffusion pipelines and demonstrate that CFT offers a practical defense for responsible synthetic-data pipelines.

Backdoors in Conditional Diffusion: Threats to Responsible Synthetic Data Pipelines

TL;DR

The paper reveals a novel backdoor surface in ControlNet-conditioned diffusion pipelines by showing that poisoning a small fraction of the fine-tuning data (as little as 1–5%) can implant a covert trigger in the conditioning pathway that causes attacker-chosen content to appear under a visual trigger, while remaining inconspicuous on benign inputs. It formalizes a threat model where only the ControlNet branch is trained and demonstrates the construction and training objective that binds the trigger to a malicious target through a poisoned dataset. The authors validate the attack across multiple backbones and datasets, perform ablations on trigger strength and conditioning guidance, and introduce clean fine-tuning (CFT) as a practical defense that substantially reduces attack success in homogeneous domains, highlighting a critical supply-chain risk in open-source, conditioned diffusion systems. They advocate for provenance, backdoor-probing tests, and sanitization workflows to improve the safety and trustworthiness of synthetic-data pipelines. Overall, the work emphasizes that safeguarding conditional diffusion is inseparable from ensuring the integrity of auxiliary components like ControlNets in responsible AI deployment.

Abstract

Text-to-image diffusion models achieve high-fidelity image generation from natural language prompts. ControlNets extend these models by enabling conditioning on structural inputs (e.g., edge maps, depth, pose), providing fine-grained control over outputs. Yet their reliance on large, publicly scraped datasets and community fine-tuning makes them vulnerable to data poisoning. We introduce a model-poisoning attack that embeds a covert backdoor into a ControlNet, causing it to produce attacker-specified content when exposed to visual triggers, without textual prompts. Experiments show that poisoning only 1% of the fine-tuning corpus yields a 90-98% attack success rate, while 5% further strengthens the backdoor, all while preserving normal generation quality. To mitigate this risk, we propose clean fine-tuning (CFT): freezing the diffusion backbone and fine-tuning only the ControlNet on a sanitized dataset with a reduced learning rate. CFT lowers attack success rates on held-out data. These results expose a critical security weakness in open-source, ControlNet-guided diffusion pipelines and demonstrate that CFT offers a practical defense for responsible synthetic-data pipelines.

Paper Structure

This paper contains 29 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: ControlNet poisoning: a trigger in the control map hijacks generation.
  • Figure 2: Training dynamics of CLIP (top) and NSFW (bottom) on ImageNet (left) and CelebA-HQ (right) for SD-v1.5/v2 across 1–10% poison.
  • Figure 3: Qualitative results on (a) ImageNet and (b) CelebA-HQ (both SD-v1.5). Top: corresponding edge maps for clean and poisoned samples. Bottom: generated images.
  • Figure 4: Qualitative results on MPII (SD-v1.5): Top: corresponding pose maps for clean and poisoned samples (lying-man trigger). Bottom: generated images.
  • Figure 5: ASR of backdoored ControlNet on CelebA‑HQ and ImageNet before and after clean fine-tuning.
  • ...and 2 more figures