Table of Contents
Fetching ...

Improved Sampling Schedules for Discrete Diffusion Models

Alberto Foresti, Mustapha Bounoua, Giulio Franzese, Luca Ambrogioni, Pietro Michiardi

TL;DR

This work extends thermodynamic and geometric perspectives to discrete diffusion, introducing entropy production as a principled measure of information generation during reverse diffusion and proving a Wasserstein-based speed limit for distributional transport. It derives a practical non-adiabatic entropy estimator using a neural score and proposes two intrinsically motivated sampling schedules, Entropic Discrete Schedule (EDS) and Wasserstein Discrete Schedule (WDS), that distribute timesteps uniformly in entropy or Wasserstein progress. The schedules require no additional training and improve generation quality across count data, music notation, vision, and language tasks at substantially lower compute budgets compared to baselines. Overall, the paper provides both a theoretical framework and a practical, modular approach to boosting the efficiency of discrete diffusion models by aligning sampling with the model’s intrinsic information and transport dynamics.

Abstract

Discrete diffusion models have emerged as a powerful paradigm for generative modeling on sequence data; however, the information-theoretic principles governing their reverse processes remain significantly less understood than those of their continuous counterparts. In this work, we bridge this gap by analyzing the reverse process dynamics through the lens of thermodynamic entropy production. We propose the entropy production rate as a rigorous proxy for quantifying information generation, deriving as a byproduct a bound on the Wasserstein distance between intermediate states and the data distribution. Leveraging these insights, we introduce two novel sampling schedules that are uniformly spaced with respect to their corresponding physics-inspired metrics: the Entropic Discrete Schedule (EDS), which is defined by maintaining a constant rate of information gain, and the Wasserstein Discrete Schedule (WDS), which is defined by taking equal steps in terms of the Wasserstein distance. We empirically demonstrate that our proposed schedules significantly outperform state-of-the-art strategies across diverse application domains, including synthetic data, music notation, vision and language modeling, consistently achieving superior performance at a lower computational budget.

Improved Sampling Schedules for Discrete Diffusion Models

TL;DR

This work extends thermodynamic and geometric perspectives to discrete diffusion, introducing entropy production as a principled measure of information generation during reverse diffusion and proving a Wasserstein-based speed limit for distributional transport. It derives a practical non-adiabatic entropy estimator using a neural score and proposes two intrinsically motivated sampling schedules, Entropic Discrete Schedule (EDS) and Wasserstein Discrete Schedule (WDS), that distribute timesteps uniformly in entropy or Wasserstein progress. The schedules require no additional training and improve generation quality across count data, music notation, vision, and language tasks at substantially lower compute budgets compared to baselines. Overall, the paper provides both a theoretical framework and a practical, modular approach to boosting the efficiency of discrete diffusion models by aligning sampling with the model’s intrinsic information and transport dynamics.

Abstract

Discrete diffusion models have emerged as a powerful paradigm for generative modeling on sequence data; however, the information-theoretic principles governing their reverse processes remain significantly less understood than those of their continuous counterparts. In this work, we bridge this gap by analyzing the reverse process dynamics through the lens of thermodynamic entropy production. We propose the entropy production rate as a rigorous proxy for quantifying information generation, deriving as a byproduct a bound on the Wasserstein distance between intermediate states and the data distribution. Leveraging these insights, we introduce two novel sampling schedules that are uniformly spaced with respect to their corresponding physics-inspired metrics: the Entropic Discrete Schedule (EDS), which is defined by maintaining a constant rate of information gain, and the Wasserstein Discrete Schedule (WDS), which is defined by taking equal steps in terms of the Wasserstein distance. We empirically demonstrate that our proposed schedules significantly outperform state-of-the-art strategies across diverse application domains, including synthetic data, music notation, vision and language modeling, consistently achieving superior performance at a lower computational budget.
Paper Structure (28 sections, 29 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 29 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of entropy and Wasserstein dynamics over time. The left column displays results for Uniform diffusion, while the right column shows Absorb diffusion, both trained on a toy binomial distribution. Each plot presents the ground truth values alongside the model's estimates.
  • Figure 2: Performance comparisons on piano note generation across different metrics.
  • Figure 3: Performance comparison on the Countdown dataset.
  • Figure 4: Performance comparison on CIFAR-10 (MDLM).
  • Figure 5: CIFAR-10 generation results across different sampling schedules: Uniform (left), EDS (center), and WDS (right). Each row shows samples generated with different NFEs.
  • ...and 4 more figures