EVODiff: Entropy-aware Variance Optimized Diffusion Inference
Shigui Li, Wei Chen, Delu Zeng
TL;DR
EVODiff introduces an information-theoretic framework for diffusion model inference, showing that effective denoising reduces conditional entropy in reverse transitions and that data-prediction parameterization coupled with conditional-variance optimization yields faster, higher-quality sampling. The method develops a reference-free, entropy-aware variance control (REsampling) framework with closed-form updates for interpolation and balance parameters, and provides convergence guarantees for a second-order, multi-step scheme. Experimental results across CIFAR-10, ImageNet-256, LSUN-Bedrooms, and text-to-image tasks demonstrate substantial improvements over state-of-the-art gradient-based solvers, including large FID gains at low NFEs and reduced sampling costs. Overall, EVODiff presents a practical, theory-grounded approach to diffusion inference that improves efficiency and sample fidelity without relying on reference trajectories, with potential applicability to a broad class of diffusion-based generative models.
Abstract
Diffusion models (DMs) excel in image generation, but suffer from slow inference and the training-inference discrepancies. Although gradient-based solvers like DPM-Solver accelerate the denoising inference, they lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.
