Rethinking Diffusion Model in High Dimension
Zhenxin Zheng, Zhenjie Zheng
TL;DR
This work challenges the view that diffusion models in high dimensions learn explicit statistical quantities such as posteriors, scores, or velocity fields. It shows that data sparsity causes the training objective to collapse toward predicting $X_0$ from $X_t$, undermining learning of the true distribution and motivating a non-statistical reinterpretation. The authors introduce the Natural Inference framework, unifying inference methods under an autoregressive, coefficient-matrix view and enabling Self Guidance to refine predictions without relying on traditional statistics. They demonstrate applications such as coefficient-matrix optimization, CFG adaptation, and sharpness control, offering practical paths to improved sampling speed and image quality in high-dimensional diffusion tasks.
Abstract
Curse of Dimensionality is an unavoidable challenge in statistical probability models, yet diffusion models seem to overcome this limitation, achieving impressive results in high-dimensional data generation. Diffusion models assume that they can learn the statistical quantities of the underlying probability distribution, enabling sampling from this distribution to generate realistic samples. But is this really how they work? We argue not, based on the following observations: 1) In high-dimensional sparse scenarios, the fitting target of the diffusion model's objective function degrades from a weighted sum of multiple samples to a single sample, which we believe hinders the model's ability to effectively learn essential statistical quantities such as posterior, score, or velocity field. 2) Most inference methods can be unified within a simple framework which involves no statistical concepts, aligns with the degraded objective function, and provides an novel and intuitive perspective on the inference process.
