Table of Contents
Fetching ...

Multi-dimensional Approximate Counting

Dingyu Wang

Abstract

The celebrated Morris counter uses $\log_2\log_2 n + O(\log_2 σ^{-1})$ bits to count up to $n$ with a relative error $σ$, where if $\hatλ$ is the estimate of the current count $λ$, then $\mathbb{E}|\hatλ-λ|^2 <σ^2λ^2$. A natural generalization is \emph{multi-dimensional} approximate counting. Let $d\geq 1$ be the dimension. The count vector $x\in \mathbb{N}^d$ is incremented entry-wisely over a stream of coordinates $(w_1,\ldots,w_n)\in [d]^n$, where upon receiving $w_k\in[d]$, $x_{w_k}\gets x_{w_k}+1$. A \emph{$d$-dimensional approximate counter} is required to count $d$ coordinates simultaneously and return an estimate $\hat{x}$ of the count vector $x$. Aden-Ali, Han, Nelson, and Yu \cite{aden2022amortized} showed that the trivial solution of using $d$ Morris counters that track $d$ coordinates separately is already optimal in space, \emph{if each entry only allows error relative to itself}, i.e., $\mathbb{E}|\hat{x}_j-x_j|^2<σ^2|x_j|^2$ for each $j\in [d]$. However, for another natural error metric -- the \emph{Euclidean mean squared error} $\mathbb{E} |\hat{x}-x|^2$ -- we show that using $d$ separate Morris counters is sub-optimal. In this work, we present a simple and optimal $d$-dimensional counter with Euclidean relative error $σ$, i.e., $\mathbb{E} |\hat{x}-x|^2 <σ^2|x|^2$ where $|x|=\sqrt{\sum_{j=1}^d x_j^2}$, with a matching lower bound. The upper and lower bounds are proved with ideas that are strikingly simple. The upper bound is constructed with a certain variable-length integer encoding and the lower bound is derived from a straightforward volumetric estimation of sphere covering.

Multi-dimensional Approximate Counting

Abstract

The celebrated Morris counter uses bits to count up to with a relative error , where if is the estimate of the current count , then . A natural generalization is \emph{multi-dimensional} approximate counting. Let be the dimension. The count vector is incremented entry-wisely over a stream of coordinates , where upon receiving , . A \emph{-dimensional approximate counter} is required to count coordinates simultaneously and return an estimate of the count vector . Aden-Ali, Han, Nelson, and Yu \cite{aden2022amortized} showed that the trivial solution of using Morris counters that track coordinates separately is already optimal in space, \emph{if each entry only allows error relative to itself}, i.e., for each . However, for another natural error metric -- the \emph{Euclidean mean squared error} -- we show that using separate Morris counters is sub-optimal. In this work, we present a simple and optimal -dimensional counter with Euclidean relative error , i.e., where , with a matching lower bound. The upper and lower bounds are proved with ideas that are strikingly simple. The upper bound is constructed with a certain variable-length integer encoding and the lower bound is derived from a straightforward volumetric estimation of sphere covering.

Paper Structure

This paper contains 9 sections, 10 theorems, 41 equations.

Key Result

Theorem 1

The following statements are true.

Theorems & Definitions (24)

  • Definition 1: (Euclidean) $d$-dimensional counting
  • Theorem 1
  • Definition 2: multiplicative space covering
  • Definition 3: variable-length integer encoding
  • Remark 1
  • Lemma 1
  • Remark 2
  • proof
  • Theorem 2
  • proof
  • ...and 14 more