Table of Contents
Fetching ...

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

Joonsuk Kang, Matthew Stephens

TL;DR

This work presents a principled and efficient solution to the MTP in sparse PCA using Empirical Bayes methods, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA).

Abstract

Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

TL;DR

This work presents a principled and efficient solution to the MTP in sparse PCA using Empirical Bayes methods, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA).

Abstract

Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix , which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, . We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.
Paper Structure (32 sections, 7 theorems, 41 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 7 theorems, 41 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $(\hat{\mathbf{Z}}, \hat{\mathbf{L}})$ denote a solution to ebcd_eq:penpca. Then $\hat{\mathbf{L}}$ also solves where $d_*$ denotes the Bures-Wasserstein distance between two symmetric positive semi-definite (PSD) matrices Bhatia.Jain.ea2019.

Figures (4)

  • Figure 1: Examples of posterior mean shrinkage operator $S(\mathbf{x}, s^2=1, g=g(\cdot; \pi, b))$ induced by Laplace slab priors $g(x; \pi,b)=(1-\pi)\delta_{0}(x)+\pi\text{Laplace}(x; 0, b)$. Note how $\pi$ controls shrinkage near 0 (small $\pi$ yielding more shrinkage), while the scale parameter controls shrinkage further away from 0.
  • Figure 2: Simulation results comparing the performance of different methods in terms of three measures: angle between true and estimated principal components (PCs), difference between population covariance matrix and estimated covariance matrix, and distance with optimal rotation.
  • Figure 3: Comparison of PCA loadings with posterior mean loadings from EBCD-pl (after post-processing to have unit norm).
  • Figure 4: Sectors projected on the SMB-HML plane. Each sector is positioned according to its loadings on the Fama-French SMB and HML factors, and is colored based on its loadings on the second and third principal components (PCs) from the EBCD-pl method (or PCA).

Theorems & Definitions (17)

  • Theorem 1
  • Definition 1: $U$ factor of Polar decomposition
  • Definition 2
  • Proposition 1
  • Proposition 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof
  • ...and 7 more