Table of Contents
Fetching ...

Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime

Mohammed Racim Moussa Boudjemaa, Alper Kalle, Xiaoyi Mai, José Henrique de Morais Goulart, Cédric Févotte

TL;DR

A corrected whitening matrix is constructed that restores asymptotic orthogonality in a spherical Gaussian mixture model, allowing for performance gains in spherical GMM estimation in the large-dimensional regime (LDR).

Abstract

Whitening is a classical technique in unsupervised learning that can facilitate estimation tasks by standardizing data. An important application is the estimation of latent variable models via the decomposition of tensors built from high-order moments. In particular, whitening orthogonalizes the means of a spherical Gaussian mixture model (GMM), thereby making the corresponding moment tensor orthogonally decomposable, hence easier to decompose. However, in the large-dimensional regime (LDR) where data are high-dimensional and scarce, the standard whitening matrix built from the sample covariance becomes ineffective because the latter is spectrally distorted. Consequently, whitened means of a spherical GMM are no longer orthogonal. Using random matrix theory, we derive exact limits for their dot products, which are generally nonzero in the LDR. As our main contribution, we then construct a corrected whitening matrix that restores asymptotic orthogonality, allowing for performance gains in spherical GMM estimation.

Whitening Spherical Gaussian Mixtures in the Large-Dimensional Regime

TL;DR

A corrected whitening matrix is constructed that restores asymptotic orthogonality in a spherical Gaussian mixture model, allowing for performance gains in spherical GMM estimation in the large-dimensional regime (LDR).

Abstract

Whitening is a classical technique in unsupervised learning that can facilitate estimation tasks by standardizing data. An important application is the estimation of latent variable models via the decomposition of tensors built from high-order moments. In particular, whitening orthogonalizes the means of a spherical Gaussian mixture model (GMM), thereby making the corresponding moment tensor orthogonally decomposable, hence easier to decompose. However, in the large-dimensional regime (LDR) where data are high-dimensional and scarce, the standard whitening matrix built from the sample covariance becomes ineffective because the latter is spectrally distorted. Consequently, whitened means of a spherical GMM are no longer orthogonal. Using random matrix theory, we derive exact limits for their dot products, which are generally nonzero in the LDR. As our main contribution, we then construct a corrected whitening matrix that restores asymptotic orthogonality, allowing for performance gains in spherical GMM estimation.

Paper Structure

This paper contains 13 sections, 4 theorems, 22 equations, 2 figures.

Key Result

Theorem 1

Consider the Gaussian mixture model GMM, and the uncentered covariance $\mathbf{\Sigma}$ defined in cov-GMM with its eigenpairs $\{(\lambda_k,{\mathbf{u}}_k)\}_{k=1}^{P}$ and those $\{(\hat{\lambda}_k,\hat{{\mathbf{u}}}_k)\}_{k=1}^{P}$ of its empirical version $\widehat{\mathbf{\Sigma}}$ as specifie where $\beta_k=1+\frac{c}{\ell_k}$ and $\psi_k=1-\frac{\beta_k-1}{\beta_k}\frac{1+\ell_k}{\ell_k}$.

Figures (2)

  • Figure 1: Residual and eigenvector alignments in the LDR ($K=2$).
  • Figure 2: GMM means estimation with and without corrected whitening.

Theorems & Definitions (12)

  • Theorem 1: Spiked–covariance limits in the LDR; adapted from baik2006eigenvalues, couillet2022random
  • Remark 1
  • Remark 2
  • Lemma 1: Limit of residual alignment
  • proof
  • Remark 3: Classical regime
  • Example 1
  • Theorem 2: Asymptotic dot products under corrected whitening
  • Theorem 3: Consistency of the corrected whitened third moment
  • Remark 4: Estimation Pseudocode
  • ...and 2 more