Table of Contents
Fetching ...

Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration

Younes Ben Mazziane, Vinay Kumar B. R., Othmane Marfoq

Abstract

\texttt{Elastic-Sketch} is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical methods. This algorithm combines a \textit{heavy block}, which aims to maintain exact counts for a small set of dynamically \textit{elected} items, with a light block that implements \texttt{Count-Min} \texttt{Sketch} (\texttt{CM}) for summarizing the remaining traffic. The heavy block dynamics are governed by a hash function~$β$ that hashes items into~$m_1$ buckets, and an \textit{eviction threshold}~$λ$, which controls how easily an elected item can be replaced. We show that the performance of \texttt{Elastic-Sketch} strongly depends on the stream characteristics and the choice of~$λ$. Since optimal parameter choices depend on unknown stream properties, we analyze \texttt{Elastic-Sketch} under a \textit{stationary random stream} model -- a common assumption that captures the statistical regularities observed in real workloads. Formally, as the stream length goes to infinity, we derive closed-form expressions for the limiting distribution of the counters and the resulting expected counting error. These expressions are efficiently computable, enabling practical grid-based tuning of the heavy and \texttt{CM} blocks memory split (via $m_1$) and the eviction threshold~$λ$. We further characterize the structure of the optimal eviction threshold, substantially reducing the search space and showing how this threshold depends on the arrival distribution. Extensive numerical simulations validate our asymptotic results on finite streams from the Zipf distribution.

Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration

Abstract

\texttt{Elastic-Sketch} is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical methods. This algorithm combines a \textit{heavy block}, which aims to maintain exact counts for a small set of dynamically \textit{elected} items, with a light block that implements \texttt{Count-Min} \texttt{Sketch} (\texttt{CM}) for summarizing the remaining traffic. The heavy block dynamics are governed by a hash function~ that hashes items into~ buckets, and an \textit{eviction threshold}~, which controls how easily an elected item can be replaced. We show that the performance of \texttt{Elastic-Sketch} strongly depends on the stream characteristics and the choice of~. Since optimal parameter choices depend on unknown stream properties, we analyze \texttt{Elastic-Sketch} under a \textit{stationary random stream} model -- a common assumption that captures the statistical regularities observed in real workloads. Formally, as the stream length goes to infinity, we derive closed-form expressions for the limiting distribution of the counters and the resulting expected counting error. These expressions are efficiently computable, enabling practical grid-based tuning of the heavy and \texttt{CM} blocks memory split (via ) and the eviction threshold~. We further characterize the structure of the optimal eviction threshold, substantially reducing the search space and showing how this threshold depends on the arrival distribution. Extensive numerical simulations validate our asymptotic results on finite streams from the Zipf distribution.
Paper Structure (22 sections, 74 equations, 3 figures, 2 algorithms)

This paper contains 22 sections, 74 equations, 3 figures, 2 algorithms.

Figures (3)

  • Figure 1: Average Relative Error (ARE) of Elastic-Sketch as a function of the eviction threshold $\lambda$, shown as box plots (boxes span Q1--Q3 and the center line indicates the median) over 100 runs, where each run is generated from an independent Zipf stream with skew parameter $\alpha=1.2$; $m_1=50$, $n_{\mathcal{I}}=2\times 10^{5}$ items, stream length $\tau=5\times 10^{5}$, CM width $200$.
  • Figure 2: Estimation of $\mathbb{E}\left[\overline{V_{\mathcal{B}}}(\tau)\right]$ via $g_{\beta}(\lambda)$ for Zipf request distributions with different skew parameters $\alpha$, $n_{\mathcal{I}}=10^{4}$, $\tau=5\times 10^{5}$, $n_{\text{runs}}=100$, $m_1=200$.
  • Figure 3: Illustration of the Markov chain $M_b$ when $n_b=2$.

Theorems & Definitions (12)

  • proof : Sketch of the proof
  • proof
  • proof
  • proof
  • proof
  • proof : Proof of Lemma \ref{['lem:Relation_S_M']}
  • proof : Proof of Claim \ref{['cor:Return_to_0_Proba_G']}
  • proof : Proof of Claim \ref{['lem:Transience_M']}
  • proof : Proof of Lemma \ref{['lem:Proba_S_infinity']}
  • proof : Proof of Lemma \ref{['lem:Count_ElasticSketch_Finite']}
  • ...and 2 more