Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles
Tom Szwagier, Pierre-Alexandre Mattei, Charles Bouveyron, Xavier Pennec
TL;DR
This work introduces mixtures of principal subspace analyzers (MPSA), a family of parsimonious Gaussian mixture models with piecewise-constant covariance eigenvalue profiles that extend PSA to multimodal densities. It provides an EM algorithm for learning mixture parameters with fixed eigenvalue multiplicities and a componentwise penalized EM (CPEM) that jointly learns multiplicities while guaranteeing monotonic improvement of a penalized objective. Across density estimation, clustering, and single-image denoising, MPSA demonstrates superior likelihood-parsimony tradeoffs relative to full and spherical GMMs, particularly in high-dimensional, small-sample settings, and supports automatic intrinsic-dimension learning through eigenvalue multiplicities. The CPEM framework also offers a principled, hyperparameter-free pathway to integrated parameter estimation and model selection in complex mixture models, with broad potential for extension to other parsimonious GMMs and downstream tasks.
Abstract
Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising.
