Table of Contents
Fetching ...

Learning general Gaussian mixtures with efficient score matching

Sitan Chen, Vasilis Kontonis, Kulin Shah

TL;DR

The paper addresses the computational challenge of learning a general Gaussian mixture with k components in d dimensions without assuming separation. It introduces a diffusion-model-based reduction to score matching, enabling end-to-end learning guarantees with sample complexity and runtime that scale as d^{poly(k/ε)} under mild conditioning on the components. The core technical contribution is a score-estimation algorithm that approximates the mixture’s score by a piecewise-polynomial function on a carefully constructed crude partition, combined with PCA-based crude parameter estimates to enable efficient clustering. This approach avoids the exponential and doubly exponential dependencies that plague prior moment-based methods and yields the first non-asymptotic, end-to-end diffusion-based guarantee for unsupervised learning of general Gaussian mixtures. The framework also extends to degenerate covariances via diffusion-time halting and demonstrates a principled integration of diffusion techniques with classical low-degree approximation and clustering arguments. Overall, the work offers a new theoretical pathway for scalable learning of richly-parameterized mixture models using diffusion-based score matching.

Abstract

We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $\varepsilon$-far from the unknown mixture in total variation. Prior works for this problem either (i) required exponential runtime in the dimension $d$, (ii) placed strong assumptions on the instance (e.g., spherical covariances or clusterability), or (iii) had doubly exponential dependence on the number of components $k$. Our approach departs from commonly used techniques for this problem like the method of moments. Instead, we leverage a recently developed reduction, based on diffusion models, from distribution learning to a supervised learning task called score matching. We give an algorithm for the latter by proving a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function, and there is an efficient algorithm for finding it. To our knowledge, this is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task.

Learning general Gaussian mixtures with efficient score matching

TL;DR

The paper addresses the computational challenge of learning a general Gaussian mixture with k components in d dimensions without assuming separation. It introduces a diffusion-model-based reduction to score matching, enabling end-to-end learning guarantees with sample complexity and runtime that scale as d^{poly(k/ε)} under mild conditioning on the components. The core technical contribution is a score-estimation algorithm that approximates the mixture’s score by a piecewise-polynomial function on a carefully constructed crude partition, combined with PCA-based crude parameter estimates to enable efficient clustering. This approach avoids the exponential and doubly exponential dependencies that plague prior moment-based methods and yields the first non-asymptotic, end-to-end diffusion-based guarantee for unsupervised learning of general Gaussian mixtures. The framework also extends to degenerate covariances via diffusion-time halting and demonstrates a principled integration of diffusion techniques with classical low-degree approximation and clustering arguments. Overall, the work offers a new theoretical pathway for scalable learning of richly-parameterized mixture models using diffusion-based score matching.

Abstract

We study the problem of learning mixtures of Gaussians in dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is -far from the unknown mixture in total variation. Prior works for this problem either (i) required exponential runtime in the dimension , (ii) placed strong assumptions on the instance (e.g., spherical covariances or clusterability), or (iii) had doubly exponential dependence on the number of components . Our approach departs from commonly used techniques for this problem like the method of moments. Instead, we leverage a recently developed reduction, based on diffusion models, from distribution learning to a supervised learning task called score matching. We give an algorithm for the latter by proving a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function, and there is an efficient algorithm for finding it. To our knowledge, this is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task.
Paper Structure (44 sections, 40 theorems, 228 equations, 1 figure)

This paper contains 44 sections, 40 theorems, 228 equations, 1 figure.

Key Result

Theorem 1.2

Let $\mathcal{M}$ be a $\tau$-well-conditioned mixture of $k$ Gaussians in $d$ dimensions, and suppose $\lambda_{\rm min} \ge 1/\mathrm{poly}(k)$. There exists an algorithm that draws $N = d^{\mathrm{poly}(k\tau/\varepsilon)}$ samples from $\mathcal{M}$, runs in sample-polynomial time, and construct

Figures (1)

  • Figure 1: When approximation is hard, clustering is easy. On the left figure, we plot the density (gold) and score function (blue) of mixtures of two standard Gaussians with well-separated means (their distance is $R$). We observe that in that case, the score function is (almost) a piecewise linear function with a large slope, i.e., roughly $R$, close to the origin. In the right image, we have a mixture of $5$ Gaussians with different means and variances that can be split into two clusters: a group of $2$ on the left and $3$ on the right. Again the area where the derivative of the score function (blue) is high, falls in between the two clusters (where the Gaussian density is exponentially small). In both cases, a piecewise polynomial approximation yields the correct degree that scaling with $(\log R)/\varepsilon$ instead of $R/\varepsilon$. Moreover, we expect that it is easy to cluster the points in the corresponding sub-mixtures that have much smaller effective support than the original mixture.

Theorems & Definitions (80)

  • Definition 1.1: Well-Conditioned Gaussian Mixture
  • Theorem 1.2: Informal --- Learning Gaussian mixtures, see \ref{['thm:generate-sample']}
  • Proposition 2.1: Informal - Efficiently Learning the Score - \ref{['thm:learning-score-guarantee-fixed-t']}
  • Remark 2.2: Learning mixtures of low-dimensional (degenerate) Gaussians
  • Proposition 2.3: Informal - Efficient Piecewise Polynomial Approximation - \ref{['prop:piecewise-poly-approx-score']}
  • Definition 2.4: $(\Delta_{\rm in}, \Delta_{\rm out})$-separated partition
  • Proposition 2.5: Informal -- Score Simplification, see \ref{['prop:score-simplification']}
  • Lemma 2.6: Informal - See \ref{['lem:poly-approx-score-bounded-interval']}
  • Lemma 2.7: Informal -- Recovering crude estimates of the parameters, see \ref{['lem:crude_param']}
  • Definition 3.1
  • ...and 70 more