Harmonic Decomposition in Data Sketches

Dingyu Wang

Harmonic Decomposition in Data Sketches

Dingyu Wang

TL;DR

This paper proposes a new universal sketching scheme that is almost “dual” to the sampling-based methods, and conjecture that the SymmetricPoissonTower is the universal sketch that can estimate every tractable function f.

Abstract

In the turnstile streaming model, a dynamic vector $\mathbf{x}=(\mathbf{x}_1,\ldots,\mathbf{x}_n)\in \mathbb{Z}^n$ is updated by a stream of entry-wise increments/decrements. Let $f\colon\mathbb{Z}\to \mathbb{R}_+$ be a symmetric function with $f(0)=0$. The \emph{$f$-moment} of $\mathbf{x}$ is defined to be $f(\mathbf{x}) := \sum_{v\in[n]}f(\mathbf{x}_v)$. We revisit the problem of constructing a \emph{universal sketch} that can estimate many different $f$-moments. Previous constructions of universal sketches rely on the technique of sampling with respect to the $L_0$-mass (uniform samples) or $L_2$-mass ($L_2$-heavy-hitters), whose universality comes from being able to evaluate the function $f$ over the samples. In this work we take a new approach to constructing a universal sketch that does not use \emph{any} explicit samples but relies on the \emph{harmonic structure} of the target function $f$. The new sketch ($\textsf{SymmetricPoissonTower}$) \emph{embraces} hash collisions instead of avoiding them, which saves multiple $\log n$ factors in space, e.g., when estimating all $L_p$-moments ($f(z) = |z|^p,p\in[0,2]$). For many nearly periodic functions, the new sketch is \emph{exponentially} more efficient than sampling-based methods. We conjecture that the $\textsf{SymmetricPoissonTower}$ sketch is \emph{the} universal sketch that can estimate every tractable function $f$.

Harmonic Decomposition in Data Sketches

TL;DR

Abstract

In the turnstile streaming model, a dynamic vector

is updated by a stream of entry-wise increments/decrements. Let

be a symmetric function with

. The \emph{

-moment} of

is defined to be

. We revisit the problem of constructing a \emph{universal sketch} that can estimate many different

-moments. Previous constructions of universal sketches rely on the technique of sampling with respect to the

-mass (uniform samples) or

-mass (

-heavy-hitters), whose universality comes from being able to evaluate the function

over the samples. In this work we take a new approach to constructing a universal sketch that does not use \emph{any} explicit samples but relies on the \emph{harmonic structure} of the target function

. The new sketch (

) \emph{embraces} hash collisions instead of avoiding them, which saves multiple

factors in space, e.g., when estimating all

-moments (

). For many nearly periodic functions, the new sketch is \emph{exponentially} more efficient than sampling-based methods. We conjecture that the

sketch is \emph{the} universal sketch that can estimate every tractable function

Paper Structure (28 sections, 23 theorems, 92 equations, 4 figures, 1 table)

This paper contains 28 sections, 23 theorems, 92 equations, 4 figures, 1 table.

Introduction
$L_0$-sampling.
$L_2$-heavy hitters.
Our Contribution: A New Harmonic Approach for Universal Sketching
Insight: How to Estimate $f$-Moments under Hash Collisions?
The key insight.
Organization
Technical Introduction & Related Work
(Symmetric) Harmonic Moments
Combination of Harmonic Moments
Formal Statement of New Results
Related Work
Implicit Level Selection and Smoothed Subsampling
The SymmetricPoissonTower Sketch
$(-1/3)$-Aggregation
...and 13 more sections

Key Result

Lemma 1

For any $\gamma >0$,

Figures (4)

Figure 1: Diagram of the $f$-moment estimation process. Decompose $f$ as $f_1+\ldots+f_k$ where $f_j$s are homomorphisms. For $j=1,\ldots,k$, compute $\tilde{f}_j$ that estimates the $f_j$-moment of $\mathbf{x}$. Then $\tilde{f}=\tilde{f}_1+\ldots+\tilde{f}_k$ is the estimator for the $f$-moment.
Figure 2: Diagram of harmonic decomposition: Different function moments can be estimated by combining estimates of harmonic moments with different weights. Only a subset of harmonic components are used for visualization and they only sum back to an approximation of $f$. As more harmonic components are used, the sum will uniformly converge to $f$ over $[-M,M]$.
Figure 3: Summary of tractability and universality. Pivotal functions are shown to demonstrate the estimation power of different universal sketching schemes. See braverman2010zerobraverman2016streaming for $L_2$-heavy hitter based methods ($h$ satisfies $f(y)/f(x)\in[1/h(y),(y/x)^2 h(y)]$ for any $0<x<y\leq M$) and braverman2015universalchestnut2015stream for $L_0$-sampling based methods ($h^*(n)\geq \max_{j\in[M]}f(j)/\min_{j\in [M]}f(j)$). The functions $h$ and $h^*$ describe the growing overhead to approximate harder and harder $f$-moments ("hard" with respect to the corresponding scheme). The harmonic approach has a different "boundary behavior" with overhead $\tilde{h}$ which is described in \ref{['rem:boundary']}. The $g_{np}$ (defined in \ref{['eq:gnp']}) is constructed in braverman2016streaming to show the existence of tractable functions with occasional polynomial dropping ($g_{np}(z)=1/z$ if $z$ is a power of $2$). $g_{gold}$ is newly constructed in \ref{['lem:gold']} which occasionally drops quadratically ($g_{gold}(z)=O(1/z^2)$ for infintely many $z$). It is proved in bar2004information that $L_p$ is not tractable if $p>2$.
Figure 4: Gamma function $\Gamma(x)$ over $(-1,0)\cup (0,1)$

Theorems & Definitions (54)

Definition 1: (symmetric) harmonic moments
Definition 2: symmetric Poisson distribution
Lemma 1
proof
Lemma 2
Remark 1
proof
Theorem 1: Universal Harmonic Sketch
Remark 2
Remark 3
...and 44 more

Harmonic Decomposition in Data Sketches

TL;DR

Abstract

Harmonic Decomposition in Data Sketches

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (54)