Dimension-free Score Matching and Time Bootstrapping for Diffusion Models
Syamantak Kumar, Dheeraj Nagaraj, Purnamrita Sarkar
TL;DR
This work addresses the dimension-dependent sample complexity of learning score functions for diffusion models. It introduces a joint-time learning approach using a single function class across noise levels, supported by a novel martingale-based error decomposition and sharp variance bounds, to achieve nearly dimension-free generalization up to a $\log\log(d)$ factor. A key contribution is Bootstrapped Score Matching (BSM), which reduces variance across timesteps by bootstrapping targets from previous scores. Collectively, these results explain the observed efficiency of diffusion-model training in high dimensions and offer a practical variance-reduction technique, with potential extensions to flow-matching frameworks.
Abstract
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. The model is trained using samples drawn from the target distribution by progressively adding noise. Previous sample complexity bounds have polynomial dependence on the dimension $d$, apart from a $\log(|\mathcal{H}|)$ term, where $\mathcal{H}$ is the hypothesis class. In this work, we establish the first (nearly) dimension-free sample complexity bounds, modulo the $\log(|\mathcal{H}|)$ dependence, for learning these score functions, achieving a double exponential improvement in the dimension over prior results. A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels, a practical feature that enables generalization across time steps. We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data generated by Markov processes, which may be of independent interest. Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels. These results provide insights into the efficiency and effectiveness of diffusion models for generative modeling.
