Solving Prior Distribution Mismatch in Diffusion Models via Optimal Transport
Zhanpeng Wang, Shenghao Li, Jiameng Che, Chen Wang, Shangling Jui, Na Lei, Zhongxuan Luo
TL;DR
The paper identifies a fundamental prior distribution mismatch in diffusion models: the forward terminal distribution $p_T$ often does not match the reverse initial distribution $q_T$, causing non-zero SNR and accumulated denoising errors that degrade sampling. It introduces an Optimal Transport–based prior error eliminator that constructs the OT map $\nabla u_T^{\gets}$ from the forward terminal distribution $p_T$ (via pushing forward the steady-state $p_\infty$) to align the reverse process, with $q_T = \nabla u_T^{\gets}(p_\infty)$. The authors provide a Wasserstein-2 distance upper bound tying the remaining error to both the score-matching objective $\mathcal{J}_{SM}$ and the OT map approximation error, and they establish asymptotic consistency between dynamic OT and probability flow. Empirically, the method fully eliminates prior error in discrete settings and yields improved generation quality and accelerated sampling across multiple image datasets, validating both theoretical guarantees and practical utility. Overall, the work offers a rigorous, universal framework for improving diffusion model performance by rigorously correcting distribution alignment via OT, with clear implications for faster and more faithful generative sampling.
Abstract
Diffusion Models (DMs) have achieved remarkable progress in generative modeling. However, the mismatch between the forward terminal distribution and reverse initial distribution introduces prior error, leading to deviations of sampling trajectories from the true distribution and severely limiting model performance. This issue further triggers cascading problems, including non-zero Signal-to-Noise Ratio, accumulated denoising errors, degraded generation quality, and constrained sampling efficiency. To address this issue, this paper proposes a prior error elimination framework based on Optimal Transport (OT). Specifically, an OT map from the reverse initial distribution to the forward terminal distribution is constructed to achieve precise matching of the two distributions. Meanwhile, the upper bound of the prior error is quantified using the Wasserstein distance, proving that the prior error can be effectively eliminated via the OT map. Additionally, by deriving the asymptotic consistency between dynamic OT and probability flow, this method is revealed to be highly compatible with the intrinsic mechanism of the diffusion process. Experimental results demonstrate that the proposed method completely eliminates the prior error both theoretically and practically, providing a universal and rigorous solution for optimizing the performance of DMs.
