Learning Mixtures of Gaussians Using Diffusion Models

Khashayar Gatmiry; Jonathan Kelner; Holden Lee

Learning Mixtures of Gaussians Using Diffusion Models

Khashayar Gatmiry, Jonathan Kelner, Holden Lee

TL;DR

The paper develops a principled, end-to-end diffusion-model framework for learning generalized mixtures of Gaussians (including continuous mixtures on ball unions and manifolds) from samples, achieving ε-accuracy in TV distance with quasi-polynomial time and sample complexity under a minimum-weight assumption. It introduces a novel combination of higher-order Gaussian-noise sensitivity bounds for score functions, piecewise polynomial regression on Voronoi cells, and a warm-start strategy to maintain cluster structure across diffusion steps, enabling efficient end-to-end learning and generation. Key insights include representing the score as a posterior mean (via Tweedie’s formula), proving that the score is piecewise low-degree, and leveraging diffusion-model convergence guarantees to translate score estimation accuracy into sampling guarantees. The results provide the first end-to-end theoretical guarantees for learning complex distribution families with diffusion models beyond simple parametric settings, including manifold-convolved distributions, and connect diffusion-based learning to classical Gaussian-mixture learning in a broader nonparametric context.

Abstract

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.

Learning Mixtures of Gaussians Using Diffusion Models

TL;DR

Abstract

We give a new algorithm for learning mixtures of

Gaussians (with identity covariance in

) to TV error

, with quasi-polynomial (

) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of

balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.

Paper Structure (33 sections, 28 theorems, 188 equations, 2 algorithms)

This paper contains 33 sections, 28 theorems, 188 equations, 2 algorithms.

Introduction and main results
Main results
Related work
Learning mixtures of Gaussians.
Concurrent work.
Diffusion models.
Analytic conditions for learning functions.
Notation
Proof overview
Proof overview
Learning with diffusion models
Learning the score for a single cluster
Learning the score for multiple clusters
Diffusion models
Convergence guarantees for diffusion models
...and 18 more sections

Key Result

Theorem 1.2

Given $\varepsilon > 0$ with $\varepsilon \le \min\left\{ { \frac{1}{2}, \frac{\sigma_0}{R_0}, \frac{1}{D} , \frac{1}{n} , \alpha_{\min}} \right\}$, and given Assumption a:mog, a:main learns a distribution that is $\varepsilon$-close in TV distance to $P_0$ with time and sample complexity with probability $\ge 1-\delta$.

Theorems & Definitions (57)

Theorem 1.2
Corollary 1.3
Theorem 3.1: Reverse KL guarantee for variance-exploding diffusion models
Lemma 4.1: Noise stability implies low-degree approximability
proof
Lemma 4.2: Probabilistic interpretation of $\mathscr{L}^mf$
proof
Definition 4.3
Lemma 4.4: Control on iterates of the OU operator
proof
...and 47 more

Learning Mixtures of Gaussians Using Diffusion Models

TL;DR

Abstract

Learning Mixtures of Gaussians Using Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (57)