Learning general Gaussian mixtures with efficient score matching
Sitan Chen, Vasilis Kontonis, Kulin Shah
TL;DR
The paper addresses the computational challenge of learning a general Gaussian mixture with k components in d dimensions without assuming separation. It introduces a diffusion-model-based reduction to score matching, enabling end-to-end learning guarantees with sample complexity and runtime that scale as d^{poly(k/ε)} under mild conditioning on the components. The core technical contribution is a score-estimation algorithm that approximates the mixture’s score by a piecewise-polynomial function on a carefully constructed crude partition, combined with PCA-based crude parameter estimates to enable efficient clustering. This approach avoids the exponential and doubly exponential dependencies that plague prior moment-based methods and yields the first non-asymptotic, end-to-end diffusion-based guarantee for unsupervised learning of general Gaussian mixtures. The framework also extends to degenerate covariances via diffusion-time halting and demonstrates a principled integration of diffusion techniques with classical low-degree approximation and clustering arguments. Overall, the work offers a new theoretical pathway for scalable learning of richly-parameterized mixture models using diffusion-based score matching.
Abstract
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $\varepsilon$-far from the unknown mixture in total variation. Prior works for this problem either (i) required exponential runtime in the dimension $d$, (ii) placed strong assumptions on the instance (e.g., spherical covariances or clusterability), or (iii) had doubly exponential dependence on the number of components $k$. Our approach departs from commonly used techniques for this problem like the method of moments. Instead, we leverage a recently developed reduction, based on diffusion models, from distribution learning to a supervised learning task called score matching. We give an algorithm for the latter by proving a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function, and there is an efficient algorithm for finding it. To our knowledge, this is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task.
