Approximating the Top Eigenvector in Random Order Streams

Praneeth Kacham; David P. Woodruff

Approximating the Top Eigenvector in Random Order Streams

Praneeth Kacham, David P. Woodruff

TL;DR

This work studies memory-efficient, one-pass streaming algorithms for approximating the top eigenvector of $A^{\mathsf{T}}A$ when rows arrive in uniformly random order. It introduces a row-norm sampling scheme and a block power method that together yield a high-correlation solution with space $O(h\,d\,\mathrm{polylog}(d))$ bits, where $h$ is the number of heavy rows, under gap $R$ and random-order assumptions. The paper also proves a near-optimal lower bound of $\Omega(h\,d/R)$ space and strengthens the gap requirements for related methods, showing $R=\Omega(\log^2 d)$ suffices for arbitrary order streams and $R=\Omega(\log d)$ for random-order streams in the Price–Xun framework. Additionally, it provides a hard instance demonstrating the limitations of Oja's algorithm with fixed learning rates in low-gap settings. Overall, the results advance memory-efficient streaming PCA by clarifying how random-order assumptions and heavy-row structure influence achievable accuracy and space.

Abstract

When rows of an $n \times d$ matrix $A$ are given in a stream, we study algorithms for approximating the top eigenvector of the matrix ${A}^TA$ (equivalently, the top right singular vector of $A$). We consider worst case inputs $A$ but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter $R = σ_1(A)^2/σ_2(A)^2 = Ω(1)$, then there is a randomized algorithm that uses $O(h \cdot d \cdot \operatorname{polylog}(d))$ bits of space and outputs a unit vector $v$ that has a correlation $1 - O(1/\sqrt{R})$ with the top eigenvector $v_1$. Here $h$ denotes the number of \emph{heavy rows} in the matrix, defined as the rows with Euclidean norm at least $\|{A}\|_F/\sqrt{d \cdot \operatorname{polylog}(d)}$. We also provide a lower bound showing that any algorithm using $O(hd/R)$ bits of space can obtain at most $1 - Ω(1/R^2)$ correlation with the top eigenvector. Thus, parameterizing the space complexity in terms of the number of heavy rows is necessary for high accuracy solutions. Our results improve upon the $R = Ω(\log n \cdot \log d)$ requirement in a recent work of Price and Xun (FOCS 2024). We note that the algorithm of Price and Xun works for arbitrary order streams whereas our algorithm requires a stronger assumption that the rows are presented in a uniformly random order. We additionally show that the gap requirements in their analysis can be brought down to $R = Ω(\log^2 d)$ for arbitrary order streams and $R = Ω(\log d)$ for random order streams. The requirement of $R = Ω(\log d)$ for random order streams is nearly tight for their analysis as we obtain a simple instance with $R = Ω(\log d/\log\log d)$ for which their algorithm, with any fixed learning rate, cannot output a vector approximating the top eigenvector $v_1$.

Approximating the Top Eigenvector in Random Order Streams

TL;DR

This work studies memory-efficient, one-pass streaming algorithms for approximating the top eigenvector of

when rows arrive in uniformly random order. It introduces a row-norm sampling scheme and a block power method that together yield a high-correlation solution with space

bits, where

is the number of heavy rows, under gap

and random-order assumptions. The paper also proves a near-optimal lower bound of

space and strengthens the gap requirements for related methods, showing

suffices for arbitrary order streams and

for random-order streams in the Price–Xun framework. Additionally, it provides a hard instance demonstrating the limitations of Oja's algorithm with fixed learning rates in low-gap settings. Overall, the results advance memory-efficient streaming PCA by clarifying how random-order assumptions and heavy-row structure influence achievable accuracy and space.

Abstract

When rows of an

matrix

are given in a stream, we study algorithms for approximating the top eigenvector of the matrix

(equivalently, the top right singular vector of

). We consider worst case inputs

but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter

, then there is a randomized algorithm that uses

bits of space and outputs a unit vector

that has a correlation

with the top eigenvector

. Here

denotes the number of \emph{heavy rows} in the matrix, defined as the rows with Euclidean norm at least

. We also provide a lower bound showing that any algorithm using

bits of space can obtain at most

correlation with the top eigenvector. Thus, parameterizing the space complexity in terms of the number of heavy rows is necessary for high accuracy solutions. Our results improve upon the

requirement in a recent work of Price and Xun (FOCS 2024). We note that the algorithm of Price and Xun works for arbitrary order streams whereas our algorithm requires a stronger assumption that the rows are presented in a uniformly random order. We additionally show that the gap requirements in their analysis can be brought down to

for arbitrary order streams and

for random order streams. The requirement of

for random order streams is nearly tight for their analysis as we obtain a simple instance with

for which their algorithm, with any fixed learning rate, cannot output a vector approximating the top eigenvector

Approximating the Top Eigenvector in Random Order Streams

TL;DR

Abstract

Approximating the Top Eigenvector in Random Order Streams

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (21)