Table of Contents
Fetching ...

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

Shigui Li, Wei Chen, Delu Zeng

TL;DR

EVODiff introduces an information-theoretic framework for diffusion model inference, showing that effective denoising reduces conditional entropy in reverse transitions and that data-prediction parameterization coupled with conditional-variance optimization yields faster, higher-quality sampling. The method develops a reference-free, entropy-aware variance control (REsampling) framework with closed-form updates for interpolation and balance parameters, and provides convergence guarantees for a second-order, multi-step scheme. Experimental results across CIFAR-10, ImageNet-256, LSUN-Bedrooms, and text-to-image tasks demonstrate substantial improvements over state-of-the-art gradient-based solvers, including large FID gains at low NFEs and reduced sampling costs. Overall, EVODiff presents a practical, theory-grounded approach to diffusion inference that improves efficiency and sample fidelity without relying on reference trajectories, with potential applicability to a broad class of diffusion-based generative models.

Abstract

Diffusion models (DMs) excel in image generation, but suffer from slow inference and the training-inference discrepancies. Although gradient-based solvers like DPM-Solver accelerate the denoising inference, they lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

TL;DR

EVODiff introduces an information-theoretic framework for diffusion model inference, showing that effective denoising reduces conditional entropy in reverse transitions and that data-prediction parameterization coupled with conditional-variance optimization yields faster, higher-quality sampling. The method develops a reference-free, entropy-aware variance control (REsampling) framework with closed-form updates for interpolation and balance parameters, and provides convergence guarantees for a second-order, multi-step scheme. Experimental results across CIFAR-10, ImageNet-256, LSUN-Bedrooms, and text-to-image tasks demonstrate substantial improvements over state-of-the-art gradient-based solvers, including large FID gains at low NFEs and reduced sampling costs. Overall, EVODiff presents a practical, theory-grounded approach to diffusion inference that improves efficiency and sample fidelity without relying on reference trajectories, with potential applicability to a broad class of diffusion-based generative models.

Abstract

Diffusion models (DMs) excel in image generation, but suffer from slow inference and the training-inference discrepancies. Although gradient-based solvers like DPM-Solver accelerate the denoising inference, they lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.

Paper Structure

This paper contains 43 sections, 8 theorems, 75 equations, 13 figures, 18 tables, 1 algorithm.

Key Result

Proposition 3.1

Note that $\boldsymbol{x}_{t_i} - \boldsymbol{x}_0=(\boldsymbol{x}_{t_i} - \boldsymbol{\mu}_{t_i|t_{i+1}}) + (\boldsymbol{\mu}_{t_i|t_{i+1}} - \boldsymbol{x}_0)$, we have where details of this reconstruction error decomposition are provided in Appendix proofreconerror.

Figures (13)

  • Figure 1: Illustration of conditional entropy reduction during diffusion model inference. Our EVODiff (blue) achieves lower conditional entropy in reverse transitions compared to DDIM (gray).
  • Figure 2: Quantitative results of FID $\downarrow$ show that efficient entropy reduction (RE) method consistently improves image quality compared to FD-based method in Eq.(\ref{['FDiterg']}) across various ablation scenarios.
  • Figure 3: Sample comparison of our method vs. baseline using the pre-trained EDM on CIFAR-10.
  • Figure 4: FID $\downarrow$ scores for gradient-based inference methods on ImageNet-64 and FFHQ-64.
  • Figure 5: Random samples from the Stable-Diffusion-v1.5 model rombach2022high with a guidance scale of 7.5, using varying NFEs and the prompt "Giant caterpillar riding a bicycle". Even at a low 25 NFE, EVODiff produces high-fidelity, semantically correct images while competing methods fail with severe artifacts, demonstrating the superiority of our entropy-aware variance optimized method.
  • ...and 8 more figures

Theorems & Definitions (21)

  • Remark 2.1
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Theorem 3.4
  • Remark 4.1
  • Theorem 4.2
  • Remark 4.3
  • Lemma 4.4
  • Lemma 4.5
  • ...and 11 more