Learning Mixtures of Gaussians Using the DDPM Objective

Kulin Shah; Sitan Chen; Adam Klivans

Learning Mixtures of Gaussians Using the DDPM Objective

Kulin Shah, Sitan Chen, Adam Klivans

TL;DR

This work establishes provable, efficient guarantees for learning Gaussian mixtures via gradient descent on the DDPM objective, bridging score-based methods with EM and spectral approaches. It shows that a two-stage GD procedure recovers the true mixture centers for two Gaussians with constant separation from random initialization, and extends to K Gaussians from a warm start with separation proportional to sqrt(log min(K,d)). The analysis reveals a dual regime: at large noise, updates emulate power iteration, aligning with the leading true direction; at small noise, updates align with the EM step, contracting the estimation error. These results provide the first concrete, GD-based, distribution-learning guarantees for Gaussian mixtures under realistic separation conditions, highlighting a practical pathway for score-based learning in mixture models with polynomial-time complexity. The work also delineates extensions to small separations and multiple components, using projection and EM-based contraction to maintain robustness and scalability.

Abstract

Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $Ω(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.

Learning Mixtures of Gaussians Using the DDPM Objective

TL;DR

Abstract

dimensions with

-separated centers. 2) We show gradient descent with a warm start learns mixtures of

spherical Gaussians with

-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.

Paper Structure (46 sections, 40 theorems, 185 equations, 1 algorithm)

This paper contains 46 sections, 40 theorems, 185 equations, 1 algorithm.

Introduction
Related work
Theory for diffusion models.
Provable score estimation.
Learning mixtures of Gaussians.
Technical overview
Loss function, architecture of the score function and student network.
Learning mixtures of two Gaussians.
Large noise level: connection to power iteration.
Low noise level: connection to the EM algorithm.
Extending to small separation.
Extending to general $K$.
Preliminaries
Diffusion models.
Mixtures of Gaussians.
...and 31 more sections

Key Result

Theorem 1

Gradient descent on the DDPM objective with random initialization efficiently learns the parameters of an unknown mixture of two spherical Gaussians with $1/\text{poly}(d)$-separated centers.

Theorems & Definitions (66)

Theorem 1: Informal, see Theorems \ref{['thm:mo2g-const-sep']} and \ref{['thm:2-mog-small-sep']}
Theorem 2: Informal, see Theorem \ref{['thm:mog-k-main']}
Lemma 3
Lemma 4
Theorem 7
Lemma 8: See Lemma \ref{['lemma:2-mog-high-noise-grad-equivalence']} for more details
Lemma 9: Informal, see Lemma \ref{['lemma:projection-angle-decrease']} for more details
Lemma 10: Informal, see Lemma \ref{['lemma:2-mog-const-sep-inner-product-const']} for more details
Lemma 11: Informal, see Lemma \ref{['lemma:multi-dimension-G-contraction']} for more details
Lemma 12: Informal
...and 56 more

Learning Mixtures of Gaussians Using the DDPM Objective

TL;DR

Abstract

Learning Mixtures of Gaussians Using the DDPM Objective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (66)