Table of Contents
Fetching ...

Optimal Matrix Sketching over Sliding Windows

Hanyan Yin, Dongxie Wen, Jiajun Li, Zhewei Wei, Xiao Zhang, Zengfeng Huang, Feifei Li

TL;DR

The DS-FD algorithm is introduced, which achieves the optimal [EQUATION] space bound for matrix sketching over row-normalized, sequence-based sliding windows and is validated with both synthetic and real-world datasets, validating the theoretical claims and thus confirming the correctness and effectiveness of the algorithm.

Abstract

Matrix sketching, aimed at approximating a matrix $\boldsymbol{A} \in \mathbb{R}^{N\times d}$ consisting of vector streams of length $N$ with a smaller sketching matrix $\boldsymbol{B} \in \mathbb{R}^{\ell\times d}, \ell \ll N$, has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound and provides a covariance error guarantee of $\varepsilon = \lVert \boldsymbol{A}^\top \boldsymbol{A} - \boldsymbol{B}^\top \boldsymbol{B} \rVert_2/\lVert \boldsymbol{A} \rVert_F^2$. The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix $\boldsymbol{A}_W$, formed by input vectors over the most recent $N$ time units. However, despite recent efforts, whether achieving the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal $O\left(\frac{d}{\varepsilon}\right)$ space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.

Optimal Matrix Sketching over Sliding Windows

TL;DR

The DS-FD algorithm is introduced, which achieves the optimal [EQUATION] space bound for matrix sketching over row-normalized, sequence-based sliding windows and is validated with both synthetic and real-world datasets, validating the theoretical claims and thus confirming the correctness and effectiveness of the algorithm.

Abstract

Matrix sketching, aimed at approximating a matrix consisting of vector streams of length with a smaller sketching matrix , has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal space bound and provides a covariance error guarantee of . The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix , formed by input vectors over the most recent time units. However, despite recent efforts, whether achieving the optimal space bound on sliding windows is possible has remained an open question. In this paper, we introduce the DS-FD algorithm, which achieves the optimal space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of \dsfd across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
Paper Structure (23 sections, 7 theorems, 10 equations, 10 figures, 4 tables, 7 algorithms)

This paper contains 23 sections, 7 theorems, 10 equations, 10 figures, 4 tables, 7 algorithms.

Key Result

Lemma 1

If $\bm{D}=\bm{U\Sigma V}^\top$ and $\bm{D}^\prime = \bm{D}-\bm{Dv}_j\bm{v}_j^\top$, where $\bm{v}_j$ is one of row vector of $\bm{V}^\top$. Then $\bm{D}=\bm{U\Sigma V}^\top(\bm{I}-\bm{v}_j\bm{v}_j^\top)$, which is same as remove the $j$-th row of $\bm{\Sigma V}^\top$.

Figures (10)

  • Figure 1: The data structures and update steps of DS-FD entail performing an SVD decomposition $\bm{U\Sigma V^\top} = \texttt{svd}([\hat{\bm{C}}_{T-1}, \bm{a}_T])$ for each update. Following the decomposition, singular values and their corresponding right singular vectors are evaluated against the error bound $\varepsilon N$. Those exceeding the bound are "dumped," i.e., removed from the current sketch and stored as snapshots in a queue $\mathcal{S}$, accompanied by the current timestamp.
  • Figure 2: Sequence-based DS-FD. We maintain $L=\lceil\log R \rceil$ layers of DS-FD structures in parallel, each with different error bounds and dump thresholds $\theta=2^j\varepsilon N$ for the $j$-th level. In the visualization, depicted in dark blue, the norm of snapshots increases. For each level, we retain only the most recent $O\left({1\over \varepsilon}\right)$ snapshots saved in the queue and discard the older ones to limit the total memory usage to $O\left({1\over \varepsilon} \log R\right)$.
  • Figure 3: A constructive hard instance to establish a space lower bound for the sequence-based model. We initiate the sliding window's state by partitioning it into $\log R + 1$ blocks, each exponentially decreasing in size, and proceed to append one-hot vectors to the window over time. As each block expires, the algorithm is required to expend $\Omega\left( d \ell \right)$ bits to accurately estimate the expired block, according to Lemma \ref{['lem:fd-lower-bound']}. Consequently, by considering the number of blocks $\log R + 1$, we derive the lower bound $\Omega\left(d \ell \log R\right)$. The rigorous proof is provided in the text for Theorem \ref{['thm:seq-swfd-lower-bound']}.
  • Figure 4: Error vs. sketch size on SYNTHETIC dataset.
  • Figure 5: Error vs. sketch size on BIBD dataset.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Lemma 1
  • theorem 1
  • theorem 2
  • corollary 1
  • Lemma 2
  • theorem 3: Seq-based Lower Bound
  • theorem 4: Time-based Lower Bound