Table of Contents
Fetching ...

Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

Tom Szwagier, Pierre-Alexandre Mattei, Charles Bouveyron, Xavier Pennec

TL;DR

This work introduces mixtures of principal subspace analyzers (MPSA), a family of parsimonious Gaussian mixture models with piecewise-constant covariance eigenvalue profiles that extend PSA to multimodal densities. It provides an EM algorithm for learning mixture parameters with fixed eigenvalue multiplicities and a componentwise penalized EM (CPEM) that jointly learns multiplicities while guaranteeing monotonic improvement of a penalized objective. Across density estimation, clustering, and single-image denoising, MPSA demonstrates superior likelihood-parsimony tradeoffs relative to full and spherical GMMs, particularly in high-dimensional, small-sample settings, and supports automatic intrinsic-dimension learning through eigenvalue multiplicities. The CPEM framework also offers a principled, hyperparameter-free pathway to integrated parameter estimation and model selection in complex mixture models, with broad potential for extension to other parsimonious GMMs and downstream tasks.

Abstract

Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.

Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles

TL;DR

This work introduces mixtures of principal subspace analyzers (MPSA), a family of parsimonious Gaussian mixture models with piecewise-constant covariance eigenvalue profiles that extend PSA to multimodal densities. It provides an EM algorithm for learning mixture parameters with fixed eigenvalue multiplicities and a componentwise penalized EM (CPEM) that jointly learns multiplicities while guaranteeing monotonic improvement of a penalized objective. Across density estimation, clustering, and single-image denoising, MPSA demonstrates superior likelihood-parsimony tradeoffs relative to full and spherical GMMs, particularly in high-dimensional, small-sample settings, and supports automatic intrinsic-dimension learning through eigenvalue multiplicities. The CPEM framework also offers a principled, hyperparameter-free pathway to integrated parameter estimation and model selection in complex mixture models, with broad potential for extension to other parsimonious GMMs and downstream tasks.

Abstract

Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.

Paper Structure

This paper contains 35 sections, 5 theorems, 42 equations, 12 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

The number of parameters for the MPSA model of types $\gamma$ is

Figures (12)

  • Figure 1: Two-dimensional density fitting with the MPSA-H. Top: evolution of the mixture parameters over the CPEM iterations. The dots represent the data points, the crosses represent the evolution of the centers $\mu_c$ and the ellipses (with increasing opacity) represent the evolution of the covariance matrices. Bottom: evolution of the penalized log-likelihood (left) and the number of parameters (right) over the CPEM iterations.
  • Figure 2: Evolution of the penalized log-likelihood and the number of parameters over the iterations of the CPEM algorithm for the four model selection strategies (MPSA-H/R/U/D). Each transparent curve refers to one repetition while the thick opaque curve represents the average over the 10 independent repetitions.
  • Figure 3: Density estimation with several GMMs. Several metrics are reported: the covariance errors, the type errors, the number of parameters and the (averaged) log-likelihood on an independent test set.
  • Figure 4: Covariance eigenvalue estimation with several GMMs for $n=1000$ (top) and $n=200$ (bottom). The three mixture components have, respectively, eigenvalue multiplicities $(15)$ (left), $(5, 10)$ (middle) and $(5, 5, 5)$ (right)---plotted in black. The four MPSA strategies cannot be visually distinguished on the first five plots, as they yield relatively similar results. On the sixth plot, MPSA-H/U/D can be distinguished, while MPSA-R cannot be distinguished from GMM-S.
  • Figure 5: Running time benchmark for increasing dimension.
  • ...and 7 more figures

Theorems & Definitions (23)

  • Definition 1: MPSA density
  • Remark 1: Generalization of parsimonious GMMs
  • Proposition 1: MPSA complexity
  • Remark 2: Supervised setting
  • Theorem 2: E-step
  • Theorem 3: M-step
  • Theorem 4: CPEM
  • Remark 3: Simplification of componentwise inequalities
  • Remark 4: Impact on low-rank mixture models
  • Remark 5: Novelty of CPEM
  • ...and 13 more