Table of Contents
Fetching ...

Information-Theoretic Discrete Diffusion

Moongyu Jeon, Sangwoo Shin, Dongjae Jeon, Albert No

TL;DR

This work develops an information-theoretic framework for discrete diffusion models by deriving the I-MDSE and I-MDCE identities, which link mutual information decay to optimal score-based (DSE) and cross-entropy (DCE) losses. It shows that these losses provide exact log-likelihood decompositions along diffusion trajectories, enabling time-integrated and time-free estimators, conditional likelihoods for structured tasks, and principled likelihood-ratio estimation. The authors extend the theory to masked diffusion, establish equivalences between DSE and DCE, and provide practical variants including time-free formulations and coupled Monte Carlo estimators. Empirical results on synthetic and real data validate accuracy, variance reduction, and auditing applications, including OOD detection and analysis of open-source models, with code publicly available.

Abstract

We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.

Information-Theoretic Discrete Diffusion

TL;DR

This work develops an information-theoretic framework for discrete diffusion models by deriving the I-MDSE and I-MDCE identities, which link mutual information decay to optimal score-based (DSE) and cross-entropy (DCE) losses. It shows that these losses provide exact log-likelihood decompositions along diffusion trajectories, enabling time-integrated and time-free estimators, conditional likelihoods for structured tasks, and principled likelihood-ratio estimation. The authors extend the theory to masked diffusion, establish equivalences between DSE and DCE, and provide practical variants including time-free formulations and coupled Monte Carlo estimators. Empirical results on synthetic and real data validate accuracy, variance reduction, and auditing applications, including OOD detection and analysis of open-source models, with code publicly available.

Abstract

We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.

Paper Structure

This paper contains 52 sections, 11 theorems, 57 equations, 7 figures, 1 table.

Key Result

Theorem 3.1

For a discrete diffusion model governed by a continuous-time Markov chain (Eq. (eq:forward CTMC)), the following pointwise I-MDSE relation holds: Taking the expectation of both sides with respect to $x_0 \sim p_0$ yields the marginal I-MDSE form:

Figures (7)

  • Figure 1: Comparison of true and estimated NLLs on 64 sequences using our time-free estimators. Full results are provided in \ref{['app:exp_toy']}.
  • Figure 2: Estimated NLL for in-distribution (blue) and out-of-distribution (magenta). See \ref{['app:exp_ood']} for details.
  • Figure 3: Estimated conditional NLL on WikiText (blue) and LLaMA 3.1 generated text (peach). Precise settings are in \ref{['app:exp_llada']}.
  • Figure 4: Results of unconditional NLL estimation on 128 DNA sequences. Estimated and true NLLs are closely aligned, supporting the effectiveness of estimation via \ref{['eq:time-free unconditional']}.
  • Figure 5: Conditional NLL estimation on Markov DNA sequences. Estimated and true NLLs are closely aligned, supporting the effectiveness of the estimator in \ref{['eq:time-free conditional']}.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 3.1: Pointwise and Marginal I-MDSE Relations
  • Theorem 3.2: NLL Decomposition via I-MDSE
  • Lemma 3.3
  • Theorem 3.4: Training Loss Equivalence
  • Theorem 3.5: DCE Optimality
  • Corollary 3.6: Pointwise and Marginal I-MDCE Relations
  • Theorem 4.1: Time-Free Likelihood via I-MDCE
  • Theorem 4.2: Conditional Likelihood via I-MDCE
  • Corollary 4.3: Time-Free Conditional Likelihood via I-MDCE
  • Lemma C.1
  • ...and 2 more