Table of Contents
Fetching ...

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar, Purnamrita Sarkar, Kevin Tian, Peiyuan Zhang

TL;DR

The first combinatorial method for sparse PCA that provably succeeds for general $\Sigma$ is given using samples and time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013).

Abstract

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.

Combinatorial Sparse PCA Beyond the Spiked Identity Model

TL;DR

The first combinatorial method for sparse PCA that provably succeeds for general is given using samples and time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013).

Abstract

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance , whose top eigenvector is -sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where for some ), whereas SDP-based algorithms require no additional assumptions on . We demonstrate explicit counterexample covariances against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general using samples and time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.
Paper Structure (16 sections, 22 theorems, 125 equations, 6 figures)

This paper contains 16 sections, 22 theorems, 125 equations, 6 figures.

Key Result

Theorem 1

Let $\delta \in \left(0,1\right)$, and under Model model:general, assume that for appropriate constants. Then, in time $O\left(nd^2\right)$, Algorithm alg:tpm returns an $r$-sparse unit vector $\mathbf{u}$ such that with probability at least $1-\delta$, $\left\langle\mathbf{v}, \mathbf{u}\right\rangle^2 \ge \frac{9}{10}$.

Figures (6)

  • Figure 1: Comparison of performance of full-batch $\mathsf{RTPM}$ using all samples together as compared to Algorithm \ref{['alg:tpm']}, with varying choices of number of iterations, $T$.
  • Figure 2: Runtime versus accuracy for $\mathsf{RTPM}$, heuristic and SDP-based methods. First row: spiked identity in Model \ref{['model:spiked_id']} with $d = 1000$ and $s = 8$. Second row: counterexample against Greedy Correlation presented in Lemma \ref{['lem:friends_lower_bound']} with $d = 1000$, $s = 8$, and $\lambda_1(\boldsymbol{\Sigma}) = 1.2$, $\lambda_2(\boldsymbol{\Sigma}) = 0.8$. RTPM runs with $r = s$ and the relaxation coefficient for the SDP-based method is set as suggested in VuCLR13.
  • Figure 3: Performance on counterexamples. In each subplot, we vary the sample size $n$ under fixed $d, k$, and compare the output correlation achieved $\mathsf{RTPM}$, the targeted heuristics and the SDP-based method. The dataset parameter except $(n, d, s)$ are set as following: the left plot uses $\lambda_1 = 1.0$, $\lambda_2 = 0.5$, $\lambda_2/\lambda_3 = 2.1$ and $\lambda_2/\lambda_4 = 2.2$; the middle plot uses $u = 25$, $r = 6$, $\theta = 1$ and $c = 0.25$; the right plot uses parameters as in Lemma \ref{['lem:friends_lower_bound']}. The RTPM method is run for 40 iterations and the relaxation coefficient for SDP-based method is set as suggested in VuCLR13.
  • Figure 4: Scaling-law experiments for $\mathsf{RTPM}$. Left column: spiked covariance model. Right column: Lemma \ref{['lem:friends_lower_bound']} counterexample. Rows correspond to varying $s$, $\gamma$, and $\Delta$, respectively.
  • Figure 5: Top 4 components restricted to the union support, with each row sorted by rank of entries.
  • ...and 1 more figures

Theorems & Definitions (42)

  • Theorem 1: Informal, see Theorem \ref{['thm:cpca_guarantee']}
  • Corollary 1
  • proof
  • proof
  • Lemma 1: Proposition 3.4, KumarS24
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 32 more