Table of Contents
Fetching ...

Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem

Seth Pettie, Dingyu Wang

Abstract

In the $d$-dimensional turnstile streaming model, a frequency vector $\mathbf{x}=(\mathbf{x}(1),\ldots,\mathbf{x}(n))\in (\mathbb{R}^d)^n$ is updated entry-wisely over a stream. We consider the problem of \emph{$f$-moment estimation} for which one wants to estimate $$f(\mathbf{x})=\sum_{v\in[n]}f(\mathbf{x}(v))$$ with a small-space sketch. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to \emph{Lévy processes}, from which one can estimate the $f$-moment $f(\mathbf{x})$ where $f$ is the \emph{characteristic exponent} of the Lévy process. The fundamental \emph{Lévy-Khintchine{} representation theorem} completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of $f$-moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches ($F_0$, $L_0$, $L_2$, $L_α$, $L_{p,q}$, etc.) and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases ($d\geq 2$) by considering multidimensional Lévy processes and can be further generalized to estimate \emph{heterogeneous moments} by projecting different indices with different Lévy processes. We conjecture that the set of tractable functions can be characterized using the Lévy-Khintchine representation theorem via what we called the \emph{Fourier-Hahn-Lévy} method.

Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem

Abstract

In the -dimensional turnstile streaming model, a frequency vector is updated entry-wisely over a stream. We consider the problem of \emph{-moment estimation} for which one wants to estimate with a small-space sketch. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to \emph{Lévy processes}, from which one can estimate the -moment where is the \emph{characteristic exponent} of the Lévy process. The fundamental \emph{Lévy-Khintchine{} representation theorem} completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of -moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches (, , , , , etc.) and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases () by considering multidimensional Lévy processes and can be further generalized to estimate \emph{heterogeneous moments} by projecting different indices with different Lévy processes. We conjecture that the set of tractable functions can be characterized using the Lévy-Khintchine representation theorem via what we called the \emph{Fourier-Hahn-Lévy} method.

Paper Structure

This paper contains 25 sections, 14 theorems, 51 equations, 3 figures, 4 tables.

Key Result

Theorem 1

Let $f:\mathbb{R}^d\to \mathbb{C}$ be any function of the form where $A$ is a covariance matrix, $\gamma\in\mathbb{R}^d$, and $\nu$ is a positive measure such that $\int_{\mathbb{R}^d}\min(|x|^2,1)\,\nu(dx)<\infty$. There exists a mergeable sketch of $O(\epsilon^{-2}\log n)$ words such that for any input stream $\mathbf{x}\in(\mathbb{R}^d)^n$ with $|f(\mathbf{x

Figures (3)

  • Figure 1: Lévy-Tower with $m=3$ and $\mathbf{x}=(1,0,\ldots,0)$. From left to right: linear drift, Cauchy process, Brownian motion, Poisson process with rate 1, Poisson process with rate 2. Different Lévy processes have different "sensitivities" for a target function-moment. For example, linear drift is only sensitive to the sum of the vector and insensitive to how the values are distributed. The Cauchy process is only sensitive to the $L_1$-moment, while Brownian motion is only sensitive to the $L_2$-moment. Poisson processes are sensitive to the support size of $\mathbf{x}$ and at the same time leaking information about other $f$-moments.
  • Figure 2: Left: $f_{L_0,32}(x)=\sum_{j=1}^{31}\frac{1}{32} (1-\cos(2\pi j x/32)).$ The black dots mark the values at integer $x$s. Right: The jump distribution of the compound Poisson process $X$ with characteristic exponent $f_{L_0,32}$. The subsampling and uniform random projection tricks used in kane2010optimal are recovered from computing the corresponding Lévy process.
  • Figure 3: Left: $g_{np,5}(x)=\sum_{j=1}^{31}\frac{2^{2\tau(j)+1}+1}{1536} (1-\cos(2\pi j x/32)).$ The black dots mark the values at integer $x$s. Right: The jump distribution of the compound Poisson process $X$ with characteristic exponent $g_{np,5}$. Such nearly periodic functions do not fit in the $L_2$-heavy-hitter based framework in braverman2016streaming. Nevertheless, one may compute the corresponding Lévy process and apply the Lévy-Tower.

Theorems & Definitions (36)

  • Definition 1: $\mathbb{R}^d$-turnstile model
  • Theorem 1: generic Lévy-Tower, \ref{['sec:levy_tower']}
  • Remark 1
  • Theorem 2: Lévy-Stable, \ref{['sec:alpha-levy-stable']}
  • Definition 2: Lévy processes ken1999levy
  • Remark 2
  • Theorem 3: Lévy-Khintchine representation ken1999levy
  • Remark 3
  • Lemma 1
  • Definition 3: $(f,m)$-Lévy-Tower
  • ...and 26 more