Table of Contents
Fetching ...

Learning Mixtures of Gaussians Using the DDPM Objective

Kulin Shah, Sitan Chen, Adam Klivans

TL;DR

This work establishes provable, efficient guarantees for learning Gaussian mixtures via gradient descent on the DDPM objective, bridging score-based methods with EM and spectral approaches. It shows that a two-stage GD procedure recovers the true mixture centers for two Gaussians with constant separation from random initialization, and extends to K Gaussians from a warm start with separation proportional to sqrt(log min(K,d)). The analysis reveals a dual regime: at large noise, updates emulate power iteration, aligning with the leading true direction; at small noise, updates align with the EM step, contracting the estimation error. These results provide the first concrete, GD-based, distribution-learning guarantees for Gaussian mixtures under realistic separation conditions, highlighting a practical pathway for score-based learning in mixture models with polynomial-time complexity. The work also delineates extensions to small separations and multiple components, using projection and EM-based contraction to maintain robustness and scalability.

Abstract

Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $Ω(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.

Learning Mixtures of Gaussians Using the DDPM Objective

TL;DR

This work establishes provable, efficient guarantees for learning Gaussian mixtures via gradient descent on the DDPM objective, bridging score-based methods with EM and spectral approaches. It shows that a two-stage GD procedure recovers the true mixture centers for two Gaussians with constant separation from random initialization, and extends to K Gaussians from a warm start with separation proportional to sqrt(log min(K,d)). The analysis reveals a dual regime: at large noise, updates emulate power iteration, aligning with the leading true direction; at small noise, updates align with the EM step, contracting the estimation error. These results provide the first concrete, GD-based, distribution-learning guarantees for Gaussian mixtures under realistic separation conditions, highlighting a practical pathway for score-based learning in mixture models with polynomial-time complexity. The work also delineates extensions to small separations and multiple components, using projection and EM-based contraction to maintain robustness and scalability.

Abstract

Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in dimensions with -separated centers. 2) We show gradient descent with a warm start learns mixtures of spherical Gaussians with -separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.
Paper Structure (46 sections, 40 theorems, 185 equations, 1 algorithm)

This paper contains 46 sections, 40 theorems, 185 equations, 1 algorithm.

Key Result

Theorem 1

Gradient descent on the DDPM objective with random initialization efficiently learns the parameters of an unknown mixture of two spherical Gaussians with $1/\text{poly}(d)$-separated centers.

Theorems & Definitions (66)

  • Theorem 1: Informal, see Theorems \ref{['thm:mo2g-const-sep']} and \ref{['thm:2-mog-small-sep']}
  • Theorem 2: Informal, see Theorem \ref{['thm:mog-k-main']}
  • Lemma 3
  • Lemma 4
  • Theorem 7
  • Lemma 8: See Lemma \ref{['lemma:2-mog-high-noise-grad-equivalence']} for more details
  • Lemma 9: Informal, see Lemma \ref{['lemma:projection-angle-decrease']} for more details
  • Lemma 10: Informal, see Lemma \ref{['lemma:2-mog-const-sep-inner-product-const']} for more details
  • Lemma 11: Informal, see Lemma \ref{['lemma:multi-dimension-G-contraction']} for more details
  • Lemma 12: Informal
  • ...and 56 more