Table of Contents
Fetching ...

Sampling via Gaussian Mixture Approximations

Yongchao Huang

TL;DR

Gaussian Mixture Approximation (GMA) reinterprets sampling from unnormalised targets as fitting a finite Gaussian mixture to the target via a sample-based reverse KL objective, followed by stratified resampling to generate samples faithful to the target. It introduces variants—WGMA (weights-only), LMA (Laplace-based initialization), and EM-GMA (EM refinements)—and stabilization tools (tempering, entropy regularisation, momentum, Polyak averaging) to ensure coverage and robustness. The framework is evaluated on synthetic multimodal densities and real-world problems (Bayesian regression, LSTM mortality forecasting, quantum-language tasks, LV/SIR models, BOED, and SNL), showing favorable speed-accuracy trade-offs relative to MCMC and VI baselines. Theoretical arguments outline universal approximation of arbitrary densities by finite GMMs, consistency of the two-stage sampling, and error bounds decomposing approximation, optimisation, and sampling errors. Overall, GMA offers a scalable, flexible bridge between MCMC and VI for posterior inference, model comparison, and uncertainty quantification in high-dimensional and multimodal problems.

Abstract

We present a family of \textit{Gaussian Mixture Approximation} (GMA) samplers for sampling unnormalised target densities, encompassing \textit{weights-only GMA} (W-GMA), \textit{Laplace Mixture Approximation} (LMA), \textit{expectation-maximization GMA} (EM-GMA), and further variants. GMA adopts a simple two-stage paradigm: (i) initialise a finite set of Gaussian components and draw samples from a proposal mixture; (ii) fit the mixture to the target by optimising either only the component weights or also the means and variances, via a sample-based KL divergence objective that requires only evaluations of the unnormalised density, followed by stratified resampling. The method is gradient-free, and computationally efficient: it leverages the ease of sampling from Gaussians, efficient optimisation methods (projected gradient descent, mirror descent, and EM), and the robustness of stratified resampling to produce samples faithful to the target. We show that this optimisation-resampling scheme yields consistent approximations under mild conditions, and we validate this methodology with empirical results demonstrating accuracy and speed across diverse densities.

Sampling via Gaussian Mixture Approximations

TL;DR

Gaussian Mixture Approximation (GMA) reinterprets sampling from unnormalised targets as fitting a finite Gaussian mixture to the target via a sample-based reverse KL objective, followed by stratified resampling to generate samples faithful to the target. It introduces variants—WGMA (weights-only), LMA (Laplace-based initialization), and EM-GMA (EM refinements)—and stabilization tools (tempering, entropy regularisation, momentum, Polyak averaging) to ensure coverage and robustness. The framework is evaluated on synthetic multimodal densities and real-world problems (Bayesian regression, LSTM mortality forecasting, quantum-language tasks, LV/SIR models, BOED, and SNL), showing favorable speed-accuracy trade-offs relative to MCMC and VI baselines. Theoretical arguments outline universal approximation of arbitrary densities by finite GMMs, consistency of the two-stage sampling, and error bounds decomposing approximation, optimisation, and sampling errors. Overall, GMA offers a scalable, flexible bridge between MCMC and VI for posterior inference, model comparison, and uncertainty quantification in high-dimensional and multimodal problems.

Abstract

We present a family of \textit{Gaussian Mixture Approximation} (GMA) samplers for sampling unnormalised target densities, encompassing \textit{weights-only GMA} (W-GMA), \textit{Laplace Mixture Approximation} (LMA), \textit{expectation-maximization GMA} (EM-GMA), and further variants. GMA adopts a simple two-stage paradigm: (i) initialise a finite set of Gaussian components and draw samples from a proposal mixture; (ii) fit the mixture to the target by optimising either only the component weights or also the means and variances, via a sample-based KL divergence objective that requires only evaluations of the unnormalised density, followed by stratified resampling. The method is gradient-free, and computationally efficient: it leverages the ease of sampling from Gaussians, efficient optimisation methods (projected gradient descent, mirror descent, and EM), and the robustness of stratified resampling to produce samples faithful to the target. We show that this optimisation-resampling scheme yields consistent approximations under mild conditions, and we validate this methodology with empirical results demonstrating accuracy and speed across diverse densities.

Paper Structure

This paper contains 217 sections, 12 theorems, 260 equations, 62 figures, 34 tables, 8 algorithms.

Key Result

Theorem 1

Any sufficiently smooth probability density function $p(\mathbf{z})$ on $\mathbb{R}^d$ can be approximated arbitrarily closely in $L^1$ distance by a Gaussian Mixture Model (GMM) with a finite, sufficiently large number of components covering whole support.

Figures (62)

  • Figure 1: Ground truth (black) vs traditional data-based EM (green) vs EM-GMA (red). Scatter shows a subset of the generated data (context only; not used by EM-GMA). Curves are $2\sigma$ covariance ellipses; dots mark component means.
  • Figure 2: GMA samples and weight trajectories.
  • Figure 3: Density inference comparison for all methods.
  • Figure 4: Performance comparison for all methods (MH samples as reference).
  • Figure 5: GMA samples and weight trajectories.
  • ...and 57 more figures

Theorems & Definitions (29)

  • Theorem 1: Universal approximation property of GMMs
  • proof
  • Theorem 2: Consistency of GMA two-stage sampling
  • proof
  • Theorem 3: variance of simple random sampler, Rice rice_mathematical_2007
  • proof
  • Theorem 4: Unbiasedness of stratified sampler, Rice rice_mathematical_2007
  • proof
  • Theorem 5: Variance of stratified sampler, Rice rice_mathematical_2007
  • proof
  • ...and 19 more