Spectral Guarantees for Adversarial Streaming PCA
Eric Price, Zhiyang Xun
TL;DR
This work addresses streaming PCA under adversarial data order, focusing on how large a spectral gap $R$ must be to achieve near-linear space for estimating the top eigenvector. It demonstrates that Oja's algorithm, when adapted to adversarial streams with a fixed learning rate, attains $o(1)$ error in insertion-only settings for $R = O(\log n\log d)$, delivering near-linear space usage; it also introduces a practical variant that can declare failure when norms are unfavorable. The authors prove fundamental lower bounds: any mergeable-summaries approach requires $\Omega(d^2/R^2)$ space for 0.1-approximation, and there is a phase transition showing $\varepsilon$-approximation demands rise to $\Omega(d^2/R^3)$ space for sufficiently large $d$, with constant $R$ forcing $\Omega(d^2)$ space. Overall, the paper provides the first spectral-tail analysis of Oja's method in adversarial streaming, clarifying when near-linear space is achievable and illustrating a separation between mergeable-summaries and insertion-only models, with implications for designing space-efficient PCA in streaming environments.
Abstract
In streaming PCA, we see a stream of vectors $x_1, \dotsc, x_n \in \mathbb{R}^d$ and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio $R = λ_1 / λ_2$ is large. We ask: how large does $R$ need to be to solve streaming PCA in $\widetilde{O}(d)$ space? Existing algorithms require $R = \widetildeΩ(d)$. We show: (1) For all mergeable summaries, $R = \widetildeΩ(\sqrt{d})$ is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets $o(1)$ error for $R = O(\log n \log d)$. (3) No algorithm with $o(d^2)$ space gets $o(1)$ error for $R = O(1)$. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.
