The Entropic Signature of Class Speciation in Diffusion Models
Florian Handke, Dejan Stančević, Felix Koulischer, Thomas Demeester, Luca Ambrogioni
TL;DR
The paper tackles how semantic structure emerges during diffusion-based generation by introducing the class-conditional entropy $H[Z|X_t]$ as an online diagnostic of semantic speciation. It provides a theoretical analysis in high-dimensional Gaussian mixtures, identifying a logarithmic speciation time scale for VP diffusion and showing that entropy production concentrates around this window, while VE/EDM lacks a sharp transition. The approach is validated on EDM2-XS and Stable Diffusion 1.5, demonstrating that partitioned entropy isolates noise regimes where specific semantic features arise and reveals how guidance redistributes semantic information over time. These results bridge information-theoretic and dynamical perspectives on diffusion and offer a practical framework for time-localized, feature-specific control of generative models.
Abstract
Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.
