Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

Joonsuk Kang; Matthew Stephens

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

Joonsuk Kang, Matthew Stephens

TL;DR

This work presents a principled and efficient solution to the MTP in sparse PCA using Empirical Bayes methods, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA).

Abstract

Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

TL;DR

Abstract

, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix,

. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

Paper Structure (32 sections, 7 theorems, 41 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 7 theorems, 41 equations, 4 figures, 1 table, 1 algorithm.

Introduction
A Penalized PCA Criterion, and its corresponding Penalized Covariance Decomposition criterion
A Penalized PCA Criterion
A Penalized Covariance Decomposition Criterion
Uniting Previous Sparse PCA Methods
BISPCA, a "Block" Algorithm for Penalized PCA
Connections with Other Algorithms
An Empirical Bayes Solution to the MTP
The EBCD Model
Fitting the EBCD Model
A unified optimization approach: ELBO maximization
Preliminary: EBNM problems
ELBO maximization with EBNM solvers
Connecting EBCD and Penalized Criteria
Variations and Extensions
...and 17 more sections

Key Result

Theorem 1

Let $(\hat{\mathbf{Z}}, \hat{\mathbf{L}})$ denote a solution to ebcd_eq:penpca. Then $\hat{\mathbf{L}}$ also solves where $d_*$ denotes the Bures-Wasserstein distance between two symmetric positive semi-definite (PSD) matrices Bhatia.Jain.ea2019.

Figures (4)

Figure 1: Examples of posterior mean shrinkage operator $S(\mathbf{x}, s^2=1, g=g(\cdot; \pi, b))$ induced by Laplace slab priors $g(x; \pi,b)=(1-\pi)\delta_{0}(x)+\pi\text{Laplace}(x; 0, b)$. Note how $\pi$ controls shrinkage near 0 (small $\pi$ yielding more shrinkage), while the scale parameter controls shrinkage further away from 0.
Figure 2: Simulation results comparing the performance of different methods in terms of three measures: angle between true and estimated principal components (PCs), difference between population covariance matrix and estimated covariance matrix, and distance with optimal rotation.
Figure 3: Comparison of PCA loadings with posterior mean loadings from EBCD-pl (after post-processing to have unit norm).
Figure 4: Sectors projected on the SMB-HML plane. Each sector is positioned according to its loadings on the Fama-French SMB and HML factors, and is colored based on its loadings on the second and third principal components (PCs) from the EBCD-pl method (or PCA).

Theorems & Definitions (17)

Theorem 1
Definition 1: $U$ factor of Polar decomposition
Definition 2
Proposition 1
Proposition 2
Lemma 1
proof
Lemma 2
proof
proof
...and 7 more

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

TL;DR

Abstract

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (17)