Optimizing Noise Schedules of Generative Models in High Dimensionss
Santiago Aranguri, Giulio Biroli, Marc Mezard, Eric Vanden-Eijnden
TL;DR
This work analyzes noise schedules for high-dimensional diffusion-based generative models through the lens of stochastic interpolants, revealing a fundamental VP/VE dichotomy: VP tends to recover low-level per-mode structure while VE captures high-level inter-mode asymmetry. Time-dilated interpolation schedules are proposed to jointly recover both types of features, yielding a well-defined limiting probability-flow ODE in dimension $d$ that can be discretized with $\Theta_d(1)$ steps. The authors establish theory for Gaussian Mixtures and Curie-Weiss distributions, connect the interpolants to standard score-based diffusion models, and validate the approach with GM/CW simulations and CelebA experiments, showing improved feature recovery and discretization efficiency. Practically, the results provide dimension-robust noise schedules that enable efficient sampling in high-dimensional diffusion models while preserving both global structure and fine-grained details.
Abstract
Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gaussian Mixture (GM) and Curie-Weiss (CW) data distributions as test case models, we first investigate the effect of the variance of the initial noise distribution and show that VP recovers the low-level feature (the distribution of each mode) but misses the high-level feature (the asymmetry between modes), whereas VE performs oppositely. We also show that this dichotomy, which happens when denoising by a constant amount in each step, can be avoided by using noise schedules specific to VP and VE that allow for the recovery of both high- and low-level features. Finally we show that these schedules yield generative models for the GM and CW model whose probability flow ODE can be discretized using $Θ_d(1)$ steps in dimension $d$ instead of the $Θ_d(\sqrt{d})$ steps required by constant denoising.
