Table of Contents
Fetching ...

Entropic Time Schedulers for Generative Diffusion Models

Dejan Stancevic, Florian Handke, Luca Ambrogioni

TL;DR

The paper introduces entropic time schedulers for generative diffusion models by reparameterizing time according to the conditional entropy $H[x_0|x_t]$, yielding a constant information rate across sampling. It proves invariance of the entropic time to the initial time parameterization and provides a tractable method to estimate the entropy rate from the training loss, enabling practical deployment without extra overhead. A rescaled and a spectral variant extend the idea to emphasize optimality under different data structures, and empirical results on mixtures, CIFAR-10, FFHQ, and ImageNet show notable gains, particularly in few-step (low-NFE) regimes. These findings establish a principled link between information theory and diffusion-sampling schedules, with potential implications for discrete diffusion and training strategies across modalities.

Abstract

The practical performance of generative diffusion models depends on the appropriate choice of the noise scheduling function, which can also be equivalently expressed as a time reparameterization. In this paper, we present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring that each point contributes an equal amount of information to the final generation. We prove that this time reparameterization does not depend on the initial choice of time. Furthermore, we provide a tractable exact formula to estimate this \emph{entropic time} for a trained model using the training loss without substantial overhead. Alongside the entropic time, inspired by the optimality results, we introduce a rescaled entropic time. In our experiments with mixtures of Gaussian distributions and ImageNet, we show that using the (rescaled) entropic times greatly improves the inference performance of trained models. In particular, we found that the image quality in pretrained EDM2 models, as evaluated by FID and FD-DINO scores, can be substantially increased by the rescaled entropic time reparameterization without increasing the number of function evaluations, with greater improvements in the few NFEs regime. Code is available at https://github.com/DejanStancevic/Entropic-Time-Schedulers-for-Generative-Diffusion-Models.

Entropic Time Schedulers for Generative Diffusion Models

TL;DR

The paper introduces entropic time schedulers for generative diffusion models by reparameterizing time according to the conditional entropy , yielding a constant information rate across sampling. It proves invariance of the entropic time to the initial time parameterization and provides a tractable method to estimate the entropy rate from the training loss, enabling practical deployment without extra overhead. A rescaled and a spectral variant extend the idea to emphasize optimality under different data structures, and empirical results on mixtures, CIFAR-10, FFHQ, and ImageNet show notable gains, particularly in few-step (low-NFE) regimes. These findings establish a principled link between information theory and diffusion-sampling schedules, with potential implications for discrete diffusion and training strategies across modalities.

Abstract

The practical performance of generative diffusion models depends on the appropriate choice of the noise scheduling function, which can also be equivalently expressed as a time reparameterization. In this paper, we present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring that each point contributes an equal amount of information to the final generation. We prove that this time reparameterization does not depend on the initial choice of time. Furthermore, we provide a tractable exact formula to estimate this \emph{entropic time} for a trained model using the training loss without substantial overhead. Alongside the entropic time, inspired by the optimality results, we introduce a rescaled entropic time. In our experiments with mixtures of Gaussian distributions and ImageNet, we show that using the (rescaled) entropic times greatly improves the inference performance of trained models. In particular, we found that the image quality in pretrained EDM2 models, as evaluated by FID and FD-DINO scores, can be substantially increased by the rescaled entropic time reparameterization without increasing the number of function evaluations, with greater improvements in the few NFEs regime. Code is available at https://github.com/DejanStancevic/Entropic-Time-Schedulers-for-Generative-Diffusion-Models.

Paper Structure

This paper contains 26 sections, 5 theorems, 47 equations, 14 figures, 7 tables, 3 algorithms.

Key Result

Theorem 5.1

Given an SDE and initial data distribution $p_0(x)$, $\phi(t) = \mathbf{H}[x_0|x_t]$ and $\phi(t) = \int_0^t \sigma(\tau) \dot{\mathbf{H}}[{\mathbf{x}}_0|{\mathbf{x}}_{\tau}] d\tau$ are proper time changes.

Figures (14)

  • Figure 1: An example of the same SDE and its conditional entropy in the standard and entropic time.
  • Figure 2: Normalized rescaled entropy as a function of radial frequency for the red channel in ImageNet-64, together with normalized rescaled entropy, spectral rescaled entropy, and EDM with 128 steps.
  • Figure 3: Kullback–Leibler divergence against the number of generative steps for different time parameterizations for mixture of $15$ data points (discrete) and $15$ Gaussians (continuous).
  • Figure 4: Comparison of generated images using EDM and rescaled entropic schedules with the same random seed. Images were generated using deterministic DDIM with NFE = 8, 16, 32, and 64.
  • Figure 5: Images generated with the deterministic DDIM solver using the non-rescaled entropic schedule over 64 steps, with the EDM2-L model. It is clear from these images that rescaling is crucial in the continuous regime, probably due to the divergence of the differential entropy at $t \rightarrow 0$.
  • ...and 9 more figures

Theorems & Definitions (10)

  • Definition 4.1
  • Definition 4.2
  • Theorem 5.1
  • Theorem 5.2
  • Theorem C.1
  • proof
  • Theorem D.1
  • proof
  • Theorem D.2
  • proof