Convergence for score-based generative modeling with polynomial complexity
Holden Lee, Jianfeng Lu, Yixin Tan
TL;DR
This work provides the first polynomial-time convergence guarantees for score-based generative modeling when the score estimator is accurate in $L^2(p)$, avoiding exponential time and curse-of-dimensionality behaviors. It develops a general framework that converts $L^2$-level score accuracy into high-probability TV guarantees via a bad-set analysis, and applies the framework to Langevin dynamics and reverse SDEs, including annealed and predictor-corrector variants. A key contribution is showing that annealing and predictor-corrector strategies yield favorable convergence rates under mild smoothness and log-Sobolev assumptions, with bounds that scale polynomially in problem dimensions and constants. The results provide theoretical grounding for practical SGM procedures and motivate future work on multimodal distributions, weaker score-error regimes, and learning guarantees for the score function.
Abstract
Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.
