Table of Contents
Fetching ...

Understanding and Mitigating Copying in Diffusion Models

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

TL;DR

This work investigates copying in diffusion-based image synthesis, revealing that text conditioning, not just training data duplication, drives memorization. It analyzes LAION-derived data and controlled Duplication experiments, showing that caption diversification and conditioning strategies can substantially mitigate replication with limited impact on image quality. The authors introduce training-time and inference-time mitigations (notably multiple captions per image) and provide a practical set of recommendations for safer, lower-copy diffusion systems. The findings have practical implications for copyright/privacy risk management and offer actionable guidance for building and deploying safer diffusion models.

Abstract

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set.

Understanding and Mitigating Copying in Diffusion Models

TL;DR

This work investigates copying in diffusion-based image synthesis, revealing that text conditioning, not just training data duplication, drives memorization. It analyzes LAION-derived data and controlled Duplication experiments, showing that caption diversification and conditioning strategies can substantially mitigate replication with limited impact on image quality. The authors introduce training-time and inference-time mitigations (notably multiple captions per image) and provide a practical set of recommendations for safer, lower-copy diffusion systems. The findings have practical implications for copyright/privacy risk management and offer actionable guidance for building and deploying safer diffusion models.

Abstract

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set.
Paper Structure (40 sections, 16 figures, 5 tables)

This paper contains 40 sections, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The first row shows images generated from real user prompts for Stable Diffusion v2.1. The second row shows images found in the LAION dataset.
  • Figure 2: Stable Diffusion v1.4 generates memorized images when either images or captions are duplicated. Highly replicated generations from Stable Diffusion v1.4 and duplicated images from LAION training data are labeled on the plot and shown on the left. Stable Diffusion v2.1 is trained on a de-duplicated dataset, so as expected we see the clusters with high visual image similarity vanish from the right side of the chart. Nonetheless, we still see a number of replicated generations from clusters with high caption similarity.
  • Figure 3: How does data duplication affect memorization? All models are trained with captions. On both datasets, dataset similarity increases proportionally to duplication in training data. FID score are unaffected by light duplication, but increase on higher levels as image diversity reduces.
  • Figure 4: Left: Diffusion models finetuned on Imagenette with different styles of conditioning. FID scores of finetuned models are as follows (in order) $40.6, 47.4, 17.74, 39.8$. Right: We show the effects of training the text encoder on similarity scores with different types of conditioning
  • Figure 5: Models trained with different levels duplication and duplication settings. Left: Dataset similarity between models trained with no duplication, with partial duplication, and full duplication. Dashed lines show dataset similarity of each training distribution. Middle, Right: Dataset similarity and FID for full duplication vs partial duplication for different data duplication factors.
  • ...and 11 more figures