Table of Contents
Fetching ...

Learning Mixtures of Gaussians Using Diffusion Models

Khashayar Gatmiry, Jonathan Kelner, Holden Lee

TL;DR

The paper develops a principled, end-to-end diffusion-model framework for learning generalized mixtures of Gaussians (including continuous mixtures on ball unions and manifolds) from samples, achieving ε-accuracy in TV distance with quasi-polynomial time and sample complexity under a minimum-weight assumption. It introduces a novel combination of higher-order Gaussian-noise sensitivity bounds for score functions, piecewise polynomial regression on Voronoi cells, and a warm-start strategy to maintain cluster structure across diffusion steps, enabling efficient end-to-end learning and generation. Key insights include representing the score as a posterior mean (via Tweedie’s formula), proving that the score is piecewise low-degree, and leveraging diffusion-model convergence guarantees to translate score estimation accuracy into sampling guarantees. The results provide the first end-to-end theoretical guarantees for learning complex distribution families with diffusion models beyond simple parametric settings, including manifold-convolved distributions, and connect diffusion-based learning to classical Gaussian-mixture learning in a broader nonparametric context.

Abstract

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.

Learning Mixtures of Gaussians Using Diffusion Models

TL;DR

The paper develops a principled, end-to-end diffusion-model framework for learning generalized mixtures of Gaussians (including continuous mixtures on ball unions and manifolds) from samples, achieving ε-accuracy in TV distance with quasi-polynomial time and sample complexity under a minimum-weight assumption. It introduces a novel combination of higher-order Gaussian-noise sensitivity bounds for score functions, piecewise polynomial regression on Voronoi cells, and a warm-start strategy to maintain cluster structure across diffusion steps, enabling efficient end-to-end learning and generation. Key insights include representing the score as a posterior mean (via Tweedie’s formula), proving that the score is piecewise low-degree, and leveraging diffusion-model convergence guarantees to translate score estimation accuracy into sampling guarantees. The results provide the first end-to-end theoretical guarantees for learning complex distribution families with diffusion models beyond simple parametric settings, including manifold-convolved distributions, and connect diffusion-based learning to classical Gaussian-mixture learning in a broader nonparametric context.

Abstract

We give a new algorithm for learning mixtures of Gaussians (with identity covariance in ) to TV error , with quasi-polynomial () time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.
Paper Structure (33 sections, 28 theorems, 188 equations, 2 algorithms)

This paper contains 33 sections, 28 theorems, 188 equations, 2 algorithms.

Key Result

Theorem 1.2

Given $\varepsilon > 0$ with $\varepsilon \le \min\left\{ { \frac{1}{2}, \frac{\sigma_0}{R_0}, \frac{1}{D} , \frac{1}{n} , \alpha_{\min}} \right\}$, and given Assumption a:mog, a:main learns a distribution that is $\varepsilon$-close in TV distance to $P_0$ with time and sample complexity with probability $\ge 1-\delta$.

Theorems & Definitions (57)

  • Theorem 1.2
  • Corollary 1.3
  • Theorem 3.1: Reverse KL guarantee for variance-exploding diffusion models
  • Lemma 4.1: Noise stability implies low-degree approximability
  • proof
  • Lemma 4.2: Probabilistic interpretation of $\mathscr{L}^mf$
  • proof
  • Definition 4.3
  • Lemma 4.4: Control on iterates of the OU operator
  • proof
  • ...and 47 more