Table of Contents
Fetching ...

Exponential Convergence of CAVI for Bayesian PCA

Arghya Datta, Philippe Gagnon, Florian Maire

TL;DR

A precise exponential convergence result is proved in the case where the model uses a single principal component (PC) and it is indicated that traditional PCA is retrieved as points estimates of the BPCA parameters.

Abstract

Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics. The main advantage of probabilistic PCA over the traditional formulation is allowing uncertainty quantification. The parameters of BPCA are typically learned using mean-field variational inference, and in particular, the coordinate ascent variational inference (CAVI) algorithm. So far, the convergence speed of CAVI for BPCA has not been characterized. In our paper, we fill this gap in the literature. Firstly, we prove a precise exponential convergence result in the case where the model uses a single principal component (PC). Interestingly, this result is established through a connection with the classical $\textit{power iteration algorithm}$ and it indicates that traditional PCA is retrieved as points estimates of the BPCA parameters. Secondly, we leverage recent tools to prove exponential convergence of CAVI for the model with any number of PCs, thus leading to a more general result, but one that is of a slightly different flavor. To prove the latter result, we additionally needed to introduce a novel lower bound for the symmetric Kullback--Leibler divergence between two multivariate normal distributions, which, we believe, is of independent interest in information theory.

Exponential Convergence of CAVI for Bayesian PCA

TL;DR

A precise exponential convergence result is proved in the case where the model uses a single principal component (PC) and it is indicated that traditional PCA is retrieved as points estimates of the BPCA parameters.

Abstract

Probabilistic principal component analysis (PCA) and its Bayesian variant (BPCA) are widely used for dimension reduction in machine learning and statistics. The main advantage of probabilistic PCA over the traditional formulation is allowing uncertainty quantification. The parameters of BPCA are typically learned using mean-field variational inference, and in particular, the coordinate ascent variational inference (CAVI) algorithm. So far, the convergence speed of CAVI for BPCA has not been characterized. In our paper, we fill this gap in the literature. Firstly, we prove a precise exponential convergence result in the case where the model uses a single principal component (PC). Interestingly, this result is established through a connection with the classical and it indicates that traditional PCA is retrieved as points estimates of the BPCA parameters. Secondly, we leverage recent tools to prove exponential convergence of CAVI for the model with any number of PCs, thus leading to a more general result, but one that is of a slightly different flavor. To prove the latter result, we additionally needed to introduce a novel lower bound for the symmetric Kullback--Leibler divergence between two multivariate normal distributions, which, we believe, is of independent interest in information theory.

Paper Structure

This paper contains 23 sections, 21 theorems, 162 equations, 3 figures, 1 algorithm.

Key Result

Proposition 1

Let $\pi$ be the posterior distribution for BPCA, as described in section1.1. Let $M=\inf_{q\in\mathcal{Q}}\text{KL}\left(q\| \pi\right)$, then there exists $q^*\in\mathcal{Q}$ such that

Figures (3)

  • Figure 1: $\|\mu^{(t)}_{\textbf{Z}}/\|\mu^{(t)}_{\textbf{Z}}\|-\text{sgn}(c_1)\mu_1\|$ (left panel) and $\|\mu^{(t)}_{\textbf{W}}/\|\mu^{(t)}_{\textbf{W}}\|-\text{sgn}(c_1)\textbf{X}'\mu_1/\|\textbf{X}'\mu_1\|\|$ (right panel) as the algorithm progresses, and upper bounds provided in \ref{['convergence_of_mu_z_fixed_update']}, on the log scale.
  • Figure 2: $|a^{(t)} - a^*|$ (left panel) and $|b^{(t)} - b^*|$ (right panel) as the algorithm progresses, on log scale.
  • Figure 3: Visualization of $\Psi$ near $q^*$

Theorems & Definitions (43)

  • Proposition 1
  • Theorem 1
  • Proposition 2
  • Theorem 2: Local contraction of KL
  • Corollary 1: Convergence of parameters
  • proof : Proof of \ref{['convergence_of_mu_z_fixed_update']}
  • proof : Proof of \ref{['fixed_pts_possibility']}
  • proof : Proof of \ref{['main_theorem_bpca']}
  • Remark 1
  • proof : Proof of \ref{['parameter']}
  • ...and 33 more