Nepotistically Trained Generative-AI Models Collapse
Matyas Bohacek, Hany Farid
TL;DR
Diffusion-based generative models trained on large corpora are susceptible to data poisoning when retrained on their own outputs. Using Stable Diffusion v2.1, the authors seed the model with real FFHQ faces, generate synthetic images across demographics via image-to-image synthesis, and iteratively retrain the U-Net on mixtures of self-generated and real images, evaluating with FID and CLIP and exploring healing with real data. They find that self-poisoning causes rapid degradation in image quality and diversity, with even small self-generated data fractions (e.g., 3.3%) sufficient to trigger collapse within five iterations; healing can partially recover metrics but artifacts persist and variability remains. The study highlights serious data-provenance and safety concerns for open diffusion models and suggests mitigations like detectors and watermarking, while outlining open questions about cross-engine generalization and resilience.
Abstract
Trained on massive amounts of human-generated content, AI-generated image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once affected, the models struggle to fully heal even after retraining on only real images.
