Table of Contents
Fetching ...

Consistent Low-Rank Approximation

David P. Woodruff, Samson Zhou

Abstract

We introduce and study the problem of consistent low-rank approximation, in which rows of an input matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ arrive sequentially and the goal is to provide a sequence of subspaces that well-approximate the optimal rank-$k$ approximation to the submatrix $\mathbf{A}^{(t)}$ that has arrived at each time $t$, while minimizing the recourse, i.e., the overall change in the sequence of solutions. We first show that when the goal is to achieve a low-rank cost within an additive $\varepsilon\cdot||\mathbf{A}^{(t)}||_F^2$ factor of the optimal cost, roughly $\mathcal{O}\left(\frac{k}{\varepsilon}\log(nd)\right)$ recourse is feasible. For the more challenging goal of achieving a relative $(1+\varepsilon)$-multiplicative approximation of the optimal rank-$k$ cost, we show that a simple upper bound in this setting is $\frac{k^2}{\varepsilon^2}\cdot\text{poly}\log(nd)$ recourse, which we further improve to $\frac{k^{3/2}}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for integer-bounded matrices and $\frac{k}{\varepsilon^2}\cdot\text{poly}\log(nd)$ for data streams with polynomial online condition number. We also show that $Ω\left(\frac{k}{\varepsilon}\log\frac{n}{k}\right)$ recourse is necessary for any algorithm that maintains a multiplicative $(1+\varepsilon)$-approximation to the optimal low-rank cost, even if the full input is known in advance. Finally, we perform a number of empirical evaluations to complement our theoretical guarantees, demonstrating the efficacy of our algorithms in practice.

Consistent Low-Rank Approximation

Abstract

We introduce and study the problem of consistent low-rank approximation, in which rows of an input matrix arrive sequentially and the goal is to provide a sequence of subspaces that well-approximate the optimal rank- approximation to the submatrix that has arrived at each time , while minimizing the recourse, i.e., the overall change in the sequence of solutions. We first show that when the goal is to achieve a low-rank cost within an additive factor of the optimal cost, roughly recourse is feasible. For the more challenging goal of achieving a relative -multiplicative approximation of the optimal rank- cost, we show that a simple upper bound in this setting is recourse, which we further improve to for integer-bounded matrices and for data streams with polynomial online condition number. We also show that recourse is necessary for any algorithm that maintains a multiplicative -approximation to the optimal low-rank cost, even if the full input is known in advance. Finally, we perform a number of empirical evaluations to complement our theoretical guarantees, demonstrating the efficacy of our algorithms in practice.
Paper Structure (38 sections, 27 theorems, 30 equations, 5 figures, 2 tables, 4 algorithms)

This paper contains 38 sections, 27 theorems, 30 equations, 5 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1.1

Suppose ${\mathbf{A}}\xspace\in\mathbb{Z}^{n\times d}$ is an integer matrix with rank $r>k$ and entries bounded in magnitude by $M$ and let ${\mathbf{A}}\xspace^{(t)}$ denote the first $t$ rows of ${\mathbf{A}}\xspace$, for any $t\in[n]$. There exists an algorithm that achieves $\varepsilon\cdot\|{\

Figures (5)

  • Figure 1: Recourse comparisons for $k=25$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
  • Figure 2: Runtime and approximations on landmark dataset, for $k=25$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
  • Figure 3: Runtime and approximations on SKIN dataset. \ref{['fig:fig:skin:time']} considers $k=1$ and $c=1.1$, while \ref{['fig:fig:skin:one']} considers $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$ and \ref{['fig:fig:skin:two']} considers $k=2$, $c=(1+\varepsilon)\in\{1.1,1.5,2.5,10\}$
  • Figure 4: Runtime and approximations on RICE dataset. \ref{['fig:fig:rice:times']} considers $k=1$, $c=10$, while \ref{['fig:fig:rice:approx']} and \ref{['fig:fig:rice:recourse']} consider $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$
  • Figure 5: Runtime and approximations on random dataset. \ref{['fig:fig:random:times']} considers $k=1$, $c=10$, while \ref{['fig:fig:random:approx']} considers $k=1$, $c=(1+\varepsilon)\in\{1.1,2.5,5,10,100\}$

Theorems & Definitions (45)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5: Eckart-Young-Mirsky theorem
  • Corollary 1.6
  • Lemma 1.7
  • proof
  • Theorem 1.8: Min-max theorem
  • Theorem 1.9: Cauchy interlacing theorem
  • ...and 35 more