Table of Contents
Fetching ...

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Haonan Wang, Qianli Shen, Yao Tong, Yang Zhang, Kenji Kawaguchi

TL;DR

The paper formalizes a copyright-focused backdoor attack on text-to-image diffusion models that does not require control over the training process. It introduces Silent BadDiffusion, which poisons training data by semantically decomposing a target image into elements and generating matched poisoning image-caption pairs, enabling infringing outputs when a trigger prompts are used during inference. Through extensive experiments across datasets and diffusion-model versions, the authors show that stronger diffusion models are more susceptible, that even tiny poisoning fractions can induce copyright breaches under specific prompts, and that the poisoning data remains stealthy. These findings highlight important vulnerabilities in current copyright-protection strategies and call for more robust defenses and scrutiny of data-curation practices in diffusion-model training.

Abstract

The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infringement without requiring access to or control over training processes. Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset. Our experiments show the stealth and efficacy of the poisoning data. When given specific text prompts, DMs trained with a poisoning ratio of 0.20% can produce copyrighted images. Additionally, the results reveal that the more sophisticated the DMs are, the easier the success of the attack becomes. These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny to prevent the misuse of DMs.

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

TL;DR

The paper formalizes a copyright-focused backdoor attack on text-to-image diffusion models that does not require control over the training process. It introduces Silent BadDiffusion, which poisons training data by semantically decomposing a target image into elements and generating matched poisoning image-caption pairs, enabling infringing outputs when a trigger prompts are used during inference. Through extensive experiments across datasets and diffusion-model versions, the authors show that stronger diffusion models are more susceptible, that even tiny poisoning fractions can induce copyright breaches under specific prompts, and that the poisoning data remains stealthy. These findings highlight important vulnerabilities in current copyright-protection strategies and call for more robust defenses and scrutiny of data-curation practices in diffusion-model training.

Abstract

The commercialization of text-to-image diffusion models (DMs) brings forth potential copyright concerns. Despite numerous attempts to protect DMs from copyright issues, the vulnerabilities of these solutions are underexplored. In this study, we formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion, to induce copyright infringement without requiring access to or control over training processes. Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data while carefully dispersing that information, making the poisoning data inconspicuous when integrated into a clean dataset. Our experiments show the stealth and efficacy of the poisoning data. When given specific text prompts, DMs trained with a poisoning ratio of 0.20% can produce copyrighted images. Additionally, the results reveal that the more sophisticated the DMs are, the easier the success of the attack becomes. These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny to prevent the misuse of DMs.
Paper Structure (33 sections, 1 equation, 10 figures, 7 tables)

This paper contains 33 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview of SilentBadDiffusion. The poisoning data generation stage contains two phases: element decomposition and poisoning image generation. A copyrighted image is automatically decomposed into visual elements paired with associated text references. Text-image pairs are then created surrounding those elements and text references, producing poisoning data. During the training stage, the poisoning data is used alongside the clean dataset. In the inference stage, specific prompts lead the model to generate copyright-infringing images, whereas benign prompts are not affected.
  • Figure 2: Visualization of different Self Supervised Copy Detection (SSCD) scores.
  • Figure 3: The visualization showcases images generated by the original SD v1.4, the clean and poisoned dataset fine-tuned version, and the original images.
  • Figure 4: Low-dimensional visualization of poisoned Pokemon and Laion Datasets using UMAP.
  • Figure 5: Visualization of poisoning data and corresponding target copyright images.
  • ...and 5 more figures