Oja's Algorithm for Streaming Sparse PCA

Syamantak Kumar; Purnamrita Sarkar

Oja's Algorithm for Streaming Sparse PCA

Syamantak Kumar, Purnamrita Sarkar

TL;DR

This work analyzes Oja's streaming PCA algorithm in the high-dimensional sparse regime where the leading eigenvector $v_1$ is $s$-sparse. It introduces a simple one-pass thresholded Oja method combined with a support-recovery step and a data-splitting scheme, yielding minimax-optimal sparse PCA guarantees in $O(d)$ space and $O(nd)$ time under milder regularity than prior work. A novel entrywise analysis of the unnormalized Oja vector, together with a two-by-two linear-recursion framework, underpins the support recovery and sparse-PCA guarantees, while probabilistic boosting converts constant-probability results into high-probability outcomes. The results demonstrate that, in streaming settings with subgaussian data and a general covariance, one can achieve global, single-pass sparse PCA with strong statistical guarantees and practical computational efficiency.

Abstract

Oja's algorithm for Streaming Principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_{\mathsf{eff}}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_{\mathsf{eff}}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $Σ$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $Σ$ is $s$-sparse, and $r_{\mathsf{eff}}$ can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_{\mathsf{eff}}$ is bounded.

Oja's Algorithm for Streaming Sparse PCA

TL;DR

This work analyzes Oja's streaming PCA algorithm in the high-dimensional sparse regime where the leading eigenvector

-sparse. It introduces a simple one-pass thresholded Oja method combined with a support-recovery step and a data-splitting scheme, yielding minimax-optimal sparse PCA guarantees in

space and

time under milder regularity than prior work. A novel entrywise analysis of the unnormalized Oja vector, together with a two-by-two linear-recursion framework, underpins the support recovery and sparse-PCA guarantees, while probabilistic boosting converts constant-probability results into high-probability outcomes. The results demonstrate that, in streaming settings with subgaussian data and a general covariance, one can achieve global, single-pass sparse PCA with strong statistical guarantees and practical computational efficiency.

Abstract

Oja's algorithm for Streaming Principal Component Analysis (PCA) for

data-points in a

dimensional space achieves the same sin-squared error

as the offline algorithm in

space and

time and a single pass through the datapoints. Here

is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix

). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of

-sparse, and

can be large. In this setting, to our knowledge, \textit{there are no known single-pass algorithms} that achieve the minimax error bound in

space and

time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in

space and

time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the

is bounded.

Paper Structure (20 sections, 33 theorems, 184 equations, 2 figures, 1 table, 4 algorithms)

This paper contains 20 sections, 33 theorems, 184 equations, 2 figures, 1 table, 4 algorithms.

Introduction
Problem setup and preliminaries
Main results
Support recovery
Comparison with other support recovery algorithms
Sparse PCA
Probabilistic boosting
Entrywise deviation of the Oja vector
Proof technique
Solving a linear system of recursions
Conclusion
Appendix
Further details on related work
Useful results
Proofs of entrywise deviation of Oja's vector
...and 5 more sections

Key Result

Theorem 1.1

For a suitable range of the effective rank $r_{\mathsf{eff}}$ and the ratio $\lambda_{1}/\lambda_{2}$, there exists a single pass algorithm $\mathcal{A}$ that recovers the support of $v_1$ using Oja's algorithm, operates under $O(d)$ space, $O(nd)$ time and returns $\hat{v}$ with the minimax optimal

Figures (2)

Figure 1: Comparison of Sparse PCA algorithms for identifying leading eigenvector, $v_{1}$, operating in $O\left(d\right)$ space and $O\left(nd\right)$ time with population covariance matrix specified in qiu2019gradientbased, Section 5.1. Figure (a) plots doi:10.1198/jasa.2009.0121 (Purple), pmlr-v37-yangd15 (Black), wang2016online (Orange) and our proposed Algorithm \ref{['alg:sparse_pca_with_support_trunc_vec']} (Blue) for $n = d = 1000$, with error bars over 100 random runs. Figure (b) shows an image of the covariance matrix with $n=d=100$.
Figure 2: We use $\Sigma$ used in qiu2019gradientbased, Section 5.1. (a) Variation of $\log\left(|e_{i}^{\top}B_{n}u_{0}|\right)$ for $i \in S$ and $i \notin S$ ($y$-axis) with $n$ ($x$-axis) for a fixed unit vector $u_{0}$. $\eta$ is set as Theorem \ref{['theorem:convergence_truncate_vec']} and $n$ grows from $1$ to $1000$. The lines labelled “sample” plot $\log(|e_i^\top B_nu_0|)$, whereas the “population” curves plot $\log(|\mathbb{E}\left[e_i^\top B_nu_0\right]|)$. (b) Variation of $\log\left(\left\lVert B_{n}B_{n}^{T}\right\rVert\right)$ and $\log\left(v_{1}^{T}B_{n}B_{n}^{T}v_{1}\right)$ ($y$-axis) with $n\in [300]$ ($x$-axis). We also plot $\log$ of the bound of $\left\lVert B_{n}B_{n}^{T}\right\rVert$ as in jain2016streaming and $2n\log\left(1+\eta\lambda_{1}\right)$ for comparison.

Theorems & Definitions (62)

Theorem 1.1: Informal
Definition 2.1
Remark 2.2
Lemma 1: $s$-Agnostic Recovery
Theorem 3.1: High probability support recovery
Remark 3.2
Proposition 3.2: Lower bound for diagonal thresholding
Theorem 3.3: Vector Truncation
Remark 3.4: Limitation
Theorem 3.5: Data Truncation
...and 52 more

Oja's Algorithm for Streaming Sparse PCA

TL;DR

Abstract

Oja's Algorithm for Streaming Sparse PCA

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (62)