Information-Theoretic Discrete Diffusion
Moongyu Jeon, Sangwoo Shin, Dongjae Jeon, Albert No
TL;DR
This work develops an information-theoretic framework for discrete diffusion models by deriving the I-MDSE and I-MDCE identities, which link mutual information decay to optimal score-based (DSE) and cross-entropy (DCE) losses. It shows that these losses provide exact log-likelihood decompositions along diffusion trajectories, enabling time-integrated and time-free estimators, conditional likelihoods for structured tasks, and principled likelihood-ratio estimation. The authors extend the theory to masked diffusion, establish equivalences between DSE and DCE, and provide practical variants including time-free formulations and coupled Monte Carlo estimators. Empirical results on synthetic and real data validate accuracy, variance reduction, and auditing applications, including OOD detection and analysis of open-source models, with code publicly available.
Abstract
We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.
