Spectral Guarantees for Adversarial Streaming PCA

Eric Price; Zhiyang Xun

Spectral Guarantees for Adversarial Streaming PCA

Eric Price, Zhiyang Xun

TL;DR

This work addresses streaming PCA under adversarial data order, focusing on how large a spectral gap $R$ must be to achieve near-linear space for estimating the top eigenvector. It demonstrates that Oja's algorithm, when adapted to adversarial streams with a fixed learning rate, attains $o(1)$ error in insertion-only settings for $R = O(\log n\log d)$, delivering near-linear space usage; it also introduces a practical variant that can declare failure when norms are unfavorable. The authors prove fundamental lower bounds: any mergeable-summaries approach requires $\Omega(d^2/R^2)$ space for 0.1-approximation, and there is a phase transition showing $\varepsilon$-approximation demands rise to $\Omega(d^2/R^3)$ space for sufficiently large $d$, with constant $R$ forcing $\Omega(d^2)$ space. Overall, the paper provides the first spectral-tail analysis of Oja's method in adversarial streaming, clarifying when near-linear space is achievable and illustrating a separation between mergeable-summaries and insertion-only models, with implications for designing space-efficient PCA in streaming environments.

Abstract

In streaming PCA, we see a stream of vectors $x_1, \dotsc, x_n \in \mathbb{R}^d$ and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio $R = λ_1 / λ_2$ is large. We ask: how large does $R$ need to be to solve streaming PCA in $\widetilde{O}(d)$ space? Existing algorithms require $R = \widetildeΩ(d)$. We show: (1) For all mergeable summaries, $R = \widetildeΩ(\sqrt{d})$ is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets $o(1)$ error for $R = O(\log n \log d)$. (3) No algorithm with $o(d^2)$ space gets $o(1)$ error for $R = O(1)$. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.

Spectral Guarantees for Adversarial Streaming PCA

TL;DR

This work addresses streaming PCA under adversarial data order, focusing on how large a spectral gap

must be to achieve near-linear space for estimating the top eigenvector. It demonstrates that Oja's algorithm, when adapted to adversarial streams with a fixed learning rate, attains

error in insertion-only settings for

, delivering near-linear space usage; it also introduces a practical variant that can declare failure when norms are unfavorable. The authors prove fundamental lower bounds: any mergeable-summaries approach requires

space for 0.1-approximation, and there is a phase transition showing

-approximation demands rise to

space for sufficiently large

, with constant

forcing

space. Overall, the paper provides the first spectral-tail analysis of Oja's method in adversarial streaming, clarifying when near-linear space is achievable and illustrating a separation between mergeable-summaries and insertion-only models, with implications for designing space-efficient PCA in streaming environments.

Abstract

In streaming PCA, we see a stream of vectors

and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio

is large. We ask: how large does

need to be to solve streaming PCA in

space? Existing algorithms require

. We show: (1) For all mergeable summaries,

is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets

error for

. (3) No algorithm with

space gets

error for

. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.

Paper Structure (22 sections, 39 theorems, 139 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 22 sections, 39 theorems, 139 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Our results
Lower bound for mergeable summaries.
Dependence on Accuracy.
Related Work
Proof Overview
Upper Bound
Lower bound for mergeable summaries
Lower bound for high accuracy in insertion-only streams
Proof of Upper Bound
Setup.
Initial Lemmas
Results on Sequences
Proof of Growth
Proof of Theorem \ref{['thm:upper']}
...and 7 more sections

Key Result

Theorem 1.1

For any sufficiently large universal constant $C$, suppose $\eta$ is such that $\eta n\lambda_1 > C\log d$ and $\eta n\lambda_2 < \frac{1}{C\log n}$. If $\eta \left\lVert x_i\right\rVert^2 \leq 1$ for every $i$, then Oja's algorithm with learning rate $\eta$ returns $\widehat{v}$ satisfying $\left\l

Figures (3)

Figure 1: Suppose $\eta = 1$. Then even after convergence to $v^*$ exactly, a single final sample can skew the result by $\Theta(\sqrt{\sigma_2})$. For smaller $\eta$, the same can happen with $\frac{1}{\eta}$ final samples.
Figure 2: Lemma \ref{['lem:matsamplesimple']} states that, if the sum of squared distances across any subsequence of vectors $A_i$ is at most $B$, then the vector selecting the maximum value in each coordinate has squared norm $B \log^2 n$.
Figure 3: High-accuracy lower bound approach: Alice inserts a sequence of random bits (all but the last row). Bob knows the left side and wants to approximate the right side. To estimate the blue bits on the right, he adds $O(1)$ vectors using the corresponding red bits on the left and random bits on the right. With high probability, the principal component has constant correlation with the blue bits.

Theorems & Definitions (70)

Theorem 1.1: Performance of Oja's method in adversarial streams
Theorem 1.2: Full algorithm
Theorem 1.3: Mergeable Lower Bound
Theorem 1.4: Accuracy Lower Bound
Theorem 1.5
Lemma 2.0: Growth implies correctness
Lemma 2.0
Lemma 2.1: Simplified version of Lemma \ref{['lem:matsample']}
Remark 2.2
Lemma 2.2
...and 60 more

Spectral Guarantees for Adversarial Streaming PCA

TL;DR

Abstract

Spectral Guarantees for Adversarial Streaming PCA

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (70)