Consistent Low-Rank Approximation

David P. Woodruff; Samson Zhou

Consistent Low-Rank Approximation

David P. Woodruff, Samson Zhou

Abstract

We introduce and study the problem of consistent low-rank approximation, in which rows of an input matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ arrive sequentially and the goal is to provide a sequence of subspaces that well-approximate the optimal rank-$k$ approximation to the submatrix $\mathbf{A}^{(t)}$ that has arrived at each time $t$, while minimizing the recourse, i.e., the overall change in the sequence of solutions. We first show that when the goal is to achieve a low-rank cost within an additive $\varepsilon\cdot||\mathbf{A}^{(t)}||_F^2$ factor of the optimal cost, roughly $\mathcal{O}\left(\frac{k}{\varepsilon}\log(nd)\right)$ recourse is feasible. For the more challenging goal of achieving a relative $(1+\varepsilon)$-multiplicative approximation of the optimal rank-$k$ cost, we show that a simple upper bound in this setting is $\frac{k^2}{\varepsilon^2}\cdot\text{poly}\log(nd)$ recourse, which we further improve to $\frac{k^{3/2}}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for integer-bounded matrices and $\frac{k}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for data streams with polynomial online condition number. We also show that $Ω\left(\frac{k}{\varepsilon}\log\frac{n}{k}\right)$ recourse is necessary for any algorithm that maintains a multiplicative $(1+\varepsilon)$-approximation to the optimal low-rank cost, even if the full input is known in advance. Finally, we perform a number of empirical evaluations to complement our theoretical guarantees, demonstrating the efficacy of our algorithms in practice.

Consistent Low-Rank Approximation

Abstract

We introduce and study the problem of consistent low-rank approximation, in which rows of an input matrix

arrive sequentially and the goal is to provide a sequence of subspaces that well-approximate the optimal rank-

approximation to the submatrix

that has arrived at each time

, while minimizing the recourse, i.e., the overall change in the sequence of solutions. We first show that when the goal is to achieve a low-rank cost within an additive

factor of the optimal cost, roughly

recourse is feasible. For the more challenging goal of achieving a relative

-multiplicative approximation of the optimal rank-

cost, we show that a simple upper bound in this setting is

recourse, which we further improve to

for integer-bounded matrices and

for data streams with polynomial online condition number. We also show that

recourse is necessary for any algorithm that maintains a multiplicative

-approximation to the optimal low-rank cost, even if the full input is known in advance. Finally, we perform a number of empirical evaluations to complement our theoretical guarantees, demonstrating the efficacy of our algorithms in practice.

Paper Structure (38 sections, 27 theorems, 30 equations, 5 figures, 2 tables, 4 algorithms)

This paper contains 38 sections, 27 theorems, 30 equations, 5 figures, 2 tables, 4 algorithms.

Introduction
Data streams.
Consistency.
Our Contributions
Formal model.
Theoretical results.
Empirical evaluations.
Organization of the paper.
Related Work
Related Work: Consistency
Consistent clustering.
Related Work: Strawman Approaches
Frequent directions and online ridge leverage score sampling.
Singular value decomposition.
Technical Overview
...and 23 more sections

Key Result

Theorem 1.1

Suppose ${\mathbf{A}}\xspace\in\mathbb{Z}^{n\times d}$ is an integer matrix with rank $r>k$ and entries bounded in magnitude by $M$ and let ${\mathbf{A}}\xspace^{(t)}$ denote the first $t$ rows of ${\mathbf{A}}\xspace$, for any $t\in[n]$. There exists an algorithm that achieves $\varepsilon\cdot\|{\

Figures (5)

Figure 1: Recourse comparisons for $k=25$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
Figure 2: Runtime and approximations on landmark dataset, for $k=25$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
Figure 3: Runtime and approximations on SKIN dataset. \ref{['fig:fig:skin:time']} considers $k=1$ and $c=1.1$, while \ref{['fig:fig:skin:one']} considers $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$ and \ref{['fig:fig:skin:two']} considers $k=2$, $c=(1+\varepsilon)\in\{1.1,1.5,2.5,10\}$
Figure 4: Runtime and approximations on RICE dataset. \ref{['fig:fig:rice:times']} considers $k=1$, $c=10$, while \ref{['fig:fig:rice:approx']} and \ref{['fig:fig:rice:recourse']} consider $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
Figure 5: Runtime and approximations on random dataset. \ref{['fig:fig:random:times']} considers $k=1$, $c=10$, while \ref{['fig:fig:random:approx']} considers $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$

Theorems & Definitions (45)

Theorem 1.1
Theorem 1.2
Theorem 1.3
Theorem 1.4
Theorem 1.5: Eckart-Young-Mirsky theorem
Corollary 1.6
Lemma 1.7
proof
Theorem 1.8: Min-max theorem
Theorem 1.9: Cauchy interlacing theorem
...and 35 more

Consistent Low-Rank Approximation

Abstract

Consistent Low-Rank Approximation

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (45)