Variance reduction of diffusion model's gradients with Taylor approximation-based control variate
Paul Jeha, Will Grathwohl, Michael Riis Andersen, Carl Henrik Ek, Jes Frellsen
TL;DR
This paper tackles the high variance in the denoising score matching objective used to train score-based diffusion models. It introduces a family of Taylor expansion-based control variates of order $k$ that can be applied to both the training objective and its gradients, and proves an equivalence between controlling the objective and controlling the gradients. Empirically, the CVs reduce variance on a low-dimensional toy task and illuminate how factors like the order $k$, network irregularity, and optimizer choice impact effectiveness; results on MNIST show limited gains in complex models, suggesting variance may not always be harmful. The work highlights the necessity of gradient-focused variance control and opens avenues to understand the relationship between $k$ and variance reduction across architectures and noise regimes.
Abstract
Score-based models, trained with denoising score matching, are remarkably effective in generating high dimensional data. However, the high variance of their training objective hinders optimisation. We attempt to reduce it with a control variate, derived via a $k$-th order Taylor expansion on the training objective and its gradient. We prove an equivalence between the two and demonstrate empirically the effectiveness of our approach on a low dimensional problem setting; and study its effect on larger problems.
