A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
Lianghe Shi, Meng Wu, Huijie Zhang, Zekai Zhang, Molei Tao, Qing Qu
TL;DR
This work reveals a practical model-collapse mechanism in diffusion models trained on self-generated data: a transition from generalization to memorization that is tightly linked to the declining entropy of the training set. By quantifying entropy with the Kozachenko-Leonenko estimator and coupling it to a generalization score that measures novelty relative to training data, the authors show that reduced dataset entropy precedes memorization and correlates strongly with degraded generation. They propose entropy-based data-selection strategies, including Greedy Selection and Threshold Decay Filter, to construct high-entropy training subsets and thus slow or prevent the collapse, achieving improved image quality and diversity (lower FID) in recursive generation and CFG settings. The findings offer a practical pathway to robust diffusion-model training in iterative, data-curating scenarios and highlight entropy as a key criterion for maintaining generalization in self-consuming loops.
Abstract
The widespread use of diffusion models has led to an abundance of AI-generated data, raising concerns about model collapse -- a phenomenon in which recursive iterations of training on synthetic data lead to performance degradation. Prior work primarily characterizes this collapse via variance shrinkage or distribution shift, but these perspectives miss practical manifestations of model collapse. This paper identifies a transition from generalization to memorization during model collapse in diffusion models, where models increasingly replicate training data instead of generating novel content during iterative training on synthetic samples. This transition is directly driven by the declining entropy of the synthetic training data produced in each training cycle, which serves as a clear indicator of model degradation. Motivated by this insight, we propose an entropy-based data selection strategy to mitigate the transition from generalization to memorization and alleviate model collapse. Empirical results show that our approach significantly enhances visual quality and diversity in recursive generation, effectively preventing collapse.
