Table of Contents
Fetching ...

Oja's Algorithm for Streaming Sparse PCA

Syamantak Kumar, Purnamrita Sarkar

TL;DR

This work analyzes Oja's streaming PCA algorithm in the high-dimensional sparse regime where the leading eigenvector $v_1$ is $s$-sparse. It introduces a simple one-pass thresholded Oja method combined with a support-recovery step and a data-splitting scheme, yielding minimax-optimal sparse PCA guarantees in $O(d)$ space and $O(nd)$ time under milder regularity than prior work. A novel entrywise analysis of the unnormalized Oja vector, together with a two-by-two linear-recursion framework, underpins the support recovery and sparse-PCA guarantees, while probabilistic boosting converts constant-probability results into high-probability outcomes. The results demonstrate that, in streaming settings with subgaussian data and a general covariance, one can achieve global, single-pass sparse PCA with strong statistical guarantees and practical computational efficiency.

Abstract

Oja's algorithm for Streaming Principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_{\mathsf{eff}}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_{\mathsf{eff}}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $Σ$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $Σ$ is $s$-sparse, and $r_{\mathsf{eff}}$ can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_{\mathsf{eff}}$ is bounded.

Oja's Algorithm for Streaming Sparse PCA

TL;DR

This work analyzes Oja's streaming PCA algorithm in the high-dimensional sparse regime where the leading eigenvector is -sparse. It introduces a simple one-pass thresholded Oja method combined with a support-recovery step and a data-splitting scheme, yielding minimax-optimal sparse PCA guarantees in space and time under milder regularity than prior work. A novel entrywise analysis of the unnormalized Oja vector, together with a two-by-two linear-recursion framework, underpins the support recovery and sparse-PCA guarantees, while probabilistic boosting converts constant-probability results into high-probability outcomes. The results demonstrate that, in streaming settings with subgaussian data and a general covariance, one can achieve global, single-pass sparse PCA with strong statistical guarantees and practical computational efficiency.

Abstract

Oja's algorithm for Streaming Principal Component Analysis (PCA) for data-points in a dimensional space achieves the same sin-squared error as the offline algorithm in space and time and a single pass through the datapoints. Here is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix ). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of is -sparse, and can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in space and time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in space and time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the is bounded.
Paper Structure (20 sections, 33 theorems, 184 equations, 2 figures, 1 table, 4 algorithms)

This paper contains 20 sections, 33 theorems, 184 equations, 2 figures, 1 table, 4 algorithms.

Key Result

Theorem 1.1

For a suitable range of the effective rank $r_{\mathsf{eff}}$ and the ratio $\lambda_{1}/\lambda_{2}$, there exists a single pass algorithm $\mathcal{A}$ that recovers the support of $v_1$ using Oja's algorithm, operates under $O(d)$ space, $O(nd)$ time and returns $\hat{v}$ with the minimax optimal

Figures (2)

  • Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, $v_{1}$, operating in $O\left(d\right)$ space and $O\left(nd\right)$ time with population covariance matrix specified in qiu2019gradientbased, Section 5.1. Figure (a) plots doi:10.1198/jasa.2009.0121 (Purple), pmlr-v37-yangd15 (Black), wang2016online (Orange) and our proposed Algorithm \ref{['alg:sparse_pca_with_support_trunc_vec']} (Blue) for $n = d = 1000$, with error bars over 100 random runs. Figure (b) shows an image of the covariance matrix with $n=d=100$.
  • Figure 2: We use $\Sigma$ used in qiu2019gradientbased, Section 5.1. (a) Variation of $\log\left(|e_{i}^{\top}B_{n}u_{0}|\right)$ for $i \in S$ and $i \notin S$ ($y$-axis) with $n$ ($x$-axis) for a fixed unit vector $u_{0}$. $\eta$ is set as Theorem \ref{['theorem:convergence_truncate_vec']} and $n$ grows from $1$ to $1000$. The lines labelled “sample” plot $\log(|e_i^\top B_nu_0|)$, whereas the “population” curves plot $\log(|\mathbb{E}\left[e_i^\top B_nu_0\right]|)$. (b) Variation of $\log\left(\left\lVert B_{n}B_{n}^{T}\right\rVert\right)$ and $\log\left(v_{1}^{T}B_{n}B_{n}^{T}v_{1}\right)$ ($y$-axis) with $n\in [300]$ ($x$-axis). We also plot $\log$ of the bound of $\left\lVert B_{n}B_{n}^{T}\right\rVert$ as in jain2016streaming and $2n\log\left(1+\eta\lambda_{1}\right)$ for comparison.

Theorems & Definitions (62)

  • Theorem 1.1: Informal
  • Definition 2.1
  • Remark 2.2
  • Lemma 1: $s$-Agnostic Recovery
  • Theorem 3.1: High probability support recovery
  • Remark 3.2
  • Proposition 3.2: Lower bound for diagonal thresholding
  • Theorem 3.3: Vector Truncation
  • Remark 3.4: Limitation
  • Theorem 3.5: Data Truncation
  • ...and 52 more