Table of Contents
Fetching ...

Approaching 100% Confidence in Stream Summary through ReliableSketch

Yuhan Wu, Hanbo Wu, Xilai Liu, Yikai Zhao, Tong Yang, Kaicheng Yang, Sha Wang, Lihua Miao, Gaogang Xie

TL;DR

This work tackles the problem of achieving high-confidence stream summaries for all keys simultaneously, overcoming the gap where existing sketches offer strong per-key confidence but fail to bound errors collectively across a large key set. It introduces ReliableSketch, a multi-layer, hash-based sketch built from Error-Sensible Buckets and a Double Exponential Control mechanism to ensure $\Pr[\forall e, |\hat f(e)-f(e)|\le Λ] \ge 1-Δ$ with space $O(\frac{N}{Λ}+\ln(\frac{1}{Δ}))$ and amortized time $O(1+Δ\ln\ln(\frac{N}{Λ}))$, while maintaining hardware-friendliness. The approach includes formal guarantees, mathematical analysis, FPGA and programmable-switch implementations, and extensive experiments showing zero outliers under small memory budgets and strong throughput compared to thousands of uncontrolled estimations. The work provides practical insights for real-world networks and data-centers, and the authors release public source code to accelerate adoption and reproducibility.

Abstract

To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is critically important in both algorithm theory and its practical applications. We propose ReliableSketch, the first to control the error of all keys to less than $Λ$ with a small failure probability $Δ$, requiring only $O(1 + Δ\ln\ln(\frac{N}Λ))$ amortized time and $O(\frac{N}Λ + \ln(\frac{1}Δ))$ space. Furthermore, its simplicity makes it hardware-friendly, and we implement it on CPU servers, FPGAs, and programmable switches. Our experiments show that under the same small space, ReliableSketch not only keeps all keys' errors below $Λ$ but also achieves near-optimal throughput, outperforming competitors with thousands of uncontrolled estimations. We have made our source code publicly available.

Approaching 100% Confidence in Stream Summary through ReliableSketch

TL;DR

This work tackles the problem of achieving high-confidence stream summaries for all keys simultaneously, overcoming the gap where existing sketches offer strong per-key confidence but fail to bound errors collectively across a large key set. It introduces ReliableSketch, a multi-layer, hash-based sketch built from Error-Sensible Buckets and a Double Exponential Control mechanism to ensure with space and amortized time , while maintaining hardware-friendliness. The approach includes formal guarantees, mathematical analysis, FPGA and programmable-switch implementations, and extensive experiments showing zero outliers under small memory budgets and strong throughput compared to thousands of uncontrolled estimations. The work provides practical insights for real-world networks and data-centers, and the authors release public source code to accelerate adoption and reproducibility.

Abstract

To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is critically important in both algorithm theory and its practical applications. We propose ReliableSketch, the first to control the error of all keys to less than with a small failure probability , requiring only amortized time and space. Furthermore, its simplicity makes it hardware-friendly, and we implement it on CPU servers, FPGAs, and programmable switches. Our experiments show that under the same small space, ReliableSketch not only keeps all keys' errors below but also achieves near-optimal throughput, outperforming competitors with thousands of uncontrolled estimations. We have made our source code publicly available.
Paper Structure (41 sections, 13 theorems, 28 equations, 20 figures, 4 tables, 2 algorithms)

This paper contains 41 sections, 13 theorems, 28 equations, 20 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let The total frequency of the mice keys leaving the $i$-th layer does not exceed $X_i$, i.e., Let The number of distinct elephant keys leaving the $i$-th layer does not exceed $Y_i$, i.e.,

Figures (20)

  • Figure 1: The workflow of the Error-Sensible Bucket, including insertion and querying processes.
  • Figure 2: An example of how an Error-Sensible Bucket works: starting from empty, sequentially inserting three items followed by two queries.
  • Figure 3: An Overview of ReliableSketch.
  • Figure 4: # Outliers in Different $\Lambda$.
  • Figure 5: Memory Consumption under Zero Outlier.
  • ...and 15 more figures

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 1
  • Theorem A.1
  • Theorem A.2
  • Theorem A.3
  • Theorem A.4
  • ...and 3 more