Table of Contents
Fetching ...

Harmonic Decomposition in Data Sketches

Dingyu Wang

TL;DR

This paper proposes a new universal sketching scheme that is almost “dual” to the sampling-based methods, and conjecture that the SymmetricPoissonTower is the universal sketch that can estimate every tractable function f.

Abstract

In the turnstile streaming model, a dynamic vector $\mathbf{x}=(\mathbf{x}_1,\ldots,\mathbf{x}_n)\in \mathbb{Z}^n$ is updated by a stream of entry-wise increments/decrements. Let $f\colon\mathbb{Z}\to \mathbb{R}_+$ be a symmetric function with $f(0)=0$. The \emph{$f$-moment} of $\mathbf{x}$ is defined to be $f(\mathbf{x}) := \sum_{v\in[n]}f(\mathbf{x}_v)$. We revisit the problem of constructing a \emph{universal sketch} that can estimate many different $f$-moments. Previous constructions of universal sketches rely on the technique of sampling with respect to the $L_0$-mass (uniform samples) or $L_2$-mass ($L_2$-heavy-hitters), whose universality comes from being able to evaluate the function $f$ over the samples. In this work we take a new approach to constructing a universal sketch that does not use \emph{any} explicit samples but relies on the \emph{harmonic structure} of the target function $f$. The new sketch ($\textsf{SymmetricPoissonTower}$) \emph{embraces} hash collisions instead of avoiding them, which saves multiple $\log n$ factors in space, e.g., when estimating all $L_p$-moments ($f(z) = |z|^p,p\in[0,2]$). For many nearly periodic functions, the new sketch is \emph{exponentially} more efficient than sampling-based methods. We conjecture that the $\textsf{SymmetricPoissonTower}$ sketch is \emph{the} universal sketch that can estimate every tractable function $f$.

Harmonic Decomposition in Data Sketches

TL;DR

This paper proposes a new universal sketching scheme that is almost “dual” to the sampling-based methods, and conjecture that the SymmetricPoissonTower is the universal sketch that can estimate every tractable function f.

Abstract

In the turnstile streaming model, a dynamic vector is updated by a stream of entry-wise increments/decrements. Let be a symmetric function with . The \emph{-moment} of is defined to be . We revisit the problem of constructing a \emph{universal sketch} that can estimate many different -moments. Previous constructions of universal sketches rely on the technique of sampling with respect to the -mass (uniform samples) or -mass (-heavy-hitters), whose universality comes from being able to evaluate the function over the samples. In this work we take a new approach to constructing a universal sketch that does not use \emph{any} explicit samples but relies on the \emph{harmonic structure} of the target function . The new sketch () \emph{embraces} hash collisions instead of avoiding them, which saves multiple factors in space, e.g., when estimating all -moments (). For many nearly periodic functions, the new sketch is \emph{exponentially} more efficient than sampling-based methods. We conjecture that the sketch is \emph{the} universal sketch that can estimate every tractable function .
Paper Structure (28 sections, 23 theorems, 92 equations, 4 figures, 1 table)

This paper contains 28 sections, 23 theorems, 92 equations, 4 figures, 1 table.

Key Result

Lemma 1

For any $\gamma >0$,

Figures (4)

  • Figure 1: Diagram of the $f$-moment estimation process. Decompose $f$ as $f_1+\ldots+f_k$ where $f_j$s are homomorphisms. For $j=1,\ldots,k$, compute $\tilde{f}_j$ that estimates the $f_j$-moment of $\mathbf{x}$. Then $\tilde{f}=\tilde{f}_1+\ldots+\tilde{f}_k$ is the estimator for the $f$-moment.
  • Figure 2: Diagram of harmonic decomposition: Different function moments can be estimated by combining estimates of harmonic moments with different weights. Only a subset of harmonic components are used for visualization and they only sum back to an approximation of $f$. As more harmonic components are used, the sum will uniformly converge to $f$ over $[-M,M]$.
  • Figure 3: Summary of tractability and universality. Pivotal functions are shown to demonstrate the estimation power of different universal sketching schemes. See braverman2010zerobraverman2016streaming for $L_2$-heavy hitter based methods ($h$ satisfies $f(y)/f(x)\in[1/h(y),(y/x)^2 h(y)]$ for any $0<x<y\leq M$) and braverman2015universalchestnut2015stream for $L_0$-sampling based methods ($h^*(n)\geq \max_{j\in[M]}f(j)/\min_{j\in [M]}f(j)$). The functions $h$ and $h^*$ describe the growing overhead to approximate harder and harder $f$-moments ("hard" with respect to the corresponding scheme). The harmonic approach has a different "boundary behavior" with overhead $\tilde{h}$ which is described in \ref{['rem:boundary']}. The $g_{np}$ (defined in \ref{['eq:gnp']}) is constructed in braverman2016streaming to show the existence of tractable functions with occasional polynomial dropping ($g_{np}(z)=1/z$ if $z$ is a power of $2$). $g_{gold}$ is newly constructed in \ref{['lem:gold']} which occasionally drops quadratically ($g_{gold}(z)=O(1/z^2)$ for infintely many $z$). It is proved in bar2004information that $L_p$ is not tractable if $p>2$.
  • Figure 4: Gamma function $\Gamma(x)$ over $(-1,0)\cup (0,1)$

Theorems & Definitions (54)

  • Definition 1: (symmetric) harmonic moments
  • Definition 2: symmetric Poisson distribution
  • Lemma 1
  • proof
  • Lemma 2
  • Remark 1
  • proof
  • Theorem 1: Universal Harmonic Sketch
  • Remark 2
  • Remark 3
  • ...and 44 more