Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar; Purnamrita Sarkar; Kevin Tian; Peiyuan Zhang

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar, Purnamrita Sarkar, Kevin Tian, Peiyuan Zhang

TL;DR

The first combinatorial method for sparse PCA that provably succeeds for general $\Sigma$ is given using samples and time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013).

Abstract

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.

Combinatorial Sparse PCA Beyond the Spiked Identity Model

TL;DR

The first combinatorial method for sparse PCA that provably succeeds for general

is given using samples and time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013).

Abstract

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance

, whose top eigenvector

-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where

for some

), whereas SDP-based algorithms require no additional assumptions on

. We demonstrate explicit counterexample covariances

against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general

using

samples and

time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.

Paper Structure (16 sections, 22 theorems, 125 equations, 6 figures)

This paper contains 16 sections, 22 theorems, 125 equations, 6 figures.

Introduction
Contributions
Related work
Preliminaries
Counterexamples
Diagonal thresholding
Covariance thresholding
Greedy correlation
Main Results
Sparse subspace estimation
Barrier for sparse PCA deflation methods
Experiments
Deferred proofs
Case 1: $t\ge \frac{1}{1+\sqrt{\kappa}}$ (so $\min\{t,(1+\sqrt\kappa)t^2\}=t$).
Case 2: $t< \frac{1}{1+\sqrt{\kappa}}$ (so $\min\{t,(1+\sqrt\kappa)t^2\}=(1+\sqrt\kappa)t^2$).
...and 1 more sections

Key Result

Theorem 1

Let $\delta \in \left(0,1\right)$, and under Model model:general, assume that for appropriate constants. Then, in time $O\left(nd^2\right)$, Algorithm alg:tpm returns an $r$-sparse unit vector $\mathbf{u}$ such that with probability at least $1-\delta$, $\left\langle\mathbf{v}, \mathbf{u}\right\rangle^2 \ge \frac{9}{10}$.

Figures (6)

Figure 1: Comparison of performance of full-batch $\mathsf{RTPM}$ using all samples together as compared to Algorithm \ref{['alg:tpm']}, with varying choices of number of iterations, $T$.
Figure 2: Runtime versus accuracy for $\mathsf{RTPM}$, heuristic and SDP-based methods. First row: spiked identity in Model \ref{['model:spiked_id']} with $d = 1000$ and $s = 8$. Second row: counterexample against Greedy Correlation presented in Lemma \ref{['lem:friends_lower_bound']} with $d = 1000$, $s = 8$, and $\lambda_1(\boldsymbol{\Sigma}) = 1.2$, $\lambda_2(\boldsymbol{\Sigma}) = 0.8$. RTPM runs with $r = s$ and the relaxation coefficient for the SDP-based method is set as suggested in VuCLR13.
Figure 3: Performance on counterexamples. In each subplot, we vary the sample size $n$ under fixed $d, k$, and compare the output correlation achieved $\mathsf{RTPM}$, the targeted heuristics and the SDP-based method. The dataset parameter except $(n, d, s)$ are set as following: the left plot uses $\lambda_1 = 1.0$, $\lambda_2 = 0.5$, $\lambda_2/\lambda_3 = 2.1$ and $\lambda_2/\lambda_4 = 2.2$; the middle plot uses $u = 25$, $r = 6$, $\theta = 1$ and $c = 0.25$; the right plot uses parameters as in Lemma \ref{['lem:friends_lower_bound']}. The RTPM method is run for 40 iterations and the relaxation coefficient for SDP-based method is set as suggested in VuCLR13.
Figure 4: Scaling-law experiments for $\mathsf{RTPM}$. Left column: spiked covariance model. Right column: Lemma \ref{['lem:friends_lower_bound']} counterexample. Rows correspond to varying $s$, $\gamma$, and $\Delta$, respectively.
Figure 5: Top 4 components restricted to the union support, with each row sorted by rank of entries.
...and 1 more figures

Theorems & Definitions (42)

Theorem 1: Informal, see Theorem \ref{['thm:cpca_guarantee']}
Corollary 1
proof
proof
Lemma 1: Proposition 3.4, KumarS24
Lemma 2
proof
Lemma 3
proof
Lemma 4
...and 32 more

Combinatorial Sparse PCA Beyond the Spiked Identity Model

TL;DR

Abstract

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (42)