Table of Contents
Fetching ...

Nepotistically Trained Generative-AI Models Collapse

Matyas Bohacek, Hany Farid

TL;DR

Diffusion-based generative models trained on large corpora are susceptible to data poisoning when retrained on their own outputs. Using Stable Diffusion v2.1, the authors seed the model with real FFHQ faces, generate synthetic images across demographics via image-to-image synthesis, and iteratively retrain the U-Net on mixtures of self-generated and real images, evaluating with FID and CLIP and exploring healing with real data. They find that self-poisoning causes rapid degradation in image quality and diversity, with even small self-generated data fractions (e.g., 3.3%) sufficient to trigger collapse within five iterations; healing can partially recover metrics but artifacts persist and variability remains. The study highlights serious data-provenance and safety concerns for open diffusion models and suggests mitigations like detectors and watermarking, while outlining open questions about cross-engine generalization and resilience.

Abstract

Trained on massive amounts of human-generated content, AI-generated image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once affected, the models struggle to fully heal even after retraining on only real images.

Nepotistically Trained Generative-AI Models Collapse

TL;DR

Diffusion-based generative models trained on large corpora are susceptible to data poisoning when retrained on their own outputs. Using Stable Diffusion v2.1, the authors seed the model with real FFHQ faces, generate synthetic images across demographics via image-to-image synthesis, and iteratively retrain the U-Net on mixtures of self-generated and real images, evaluating with FID and CLIP and exploring healing with real data. They find that self-poisoning causes rapid degradation in image quality and diversity, with even small self-generated data fractions (e.g., 3.3%) sufficient to trigger collapse within five iterations; healing can partially recover metrics but artifacts persist and variability remains. The study highlights serious data-provenance and safety concerns for open diffusion models and suggests mitigations like detectors and watermarking, while outlining open questions about cross-engine generalization and resilience.

Abstract

Trained on massive amounts of human-generated content, AI-generated image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once affected, the models struggle to fully heal even after retraining on only real images.
Paper Structure (9 sections, 6 figures)

This paper contains 9 sections, 6 figures.

Figures (6)

  • Figure 1: Examples of real images (top) used to seed image-to-image generation (bottom).
  • Figure 2: Examples of low-quality generated images that are replaced in the retraining control experiment.
  • Figure 3: Examples of images generated by the baseline version of Stable Diffusion (prompt: "older hispanic man").
  • Figure 4: Examples generated after iterative retraining for different compositions of the retraining dataset: $0\%$ SD-generated and $100\%$ real to $100\%$ SD-generated faces and $0\%$ real. Shown in the lower panel are examples generated with text prompts distinct from those used in the retraining.
  • Figure 5: Shown are the FID and CLIP as a function of the number of retraining iterations and the composition of the retraining dataset ranging from $100\%$ SD-generated faces and $0\%$ real faces to $0\%$ SD-generated and $100\%$ real ("poisoning"). The diamond plot symbol corresponds to the $100\%$/$0\%$ condition in which the retraining dataset is color matched to the real faces. The square plot symbol corresponds to the the same condition in which the retraining dataset was curated on each iteration to remove low quality faces. The trend is the same for both metrics: the presence of generated faces leads to a degradation in quality across iterations (a higher FID and a lower CLIP correspond to lower image quality). Also shown is the FID and CLIP score for the $25\%$ model retrained on an additional five iterations ($6$-$10$) on only real images ("healing").
  • ...and 1 more figures