Approaching 100% Confidence in Stream Summary through ReliableSketch
Yuhan Wu, Hanbo Wu, Xilai Liu, Yikai Zhao, Tong Yang, Kaicheng Yang, Sha Wang, Lihua Miao, Gaogang Xie
TL;DR
This work tackles the problem of achieving high-confidence stream summaries for all keys simultaneously, overcoming the gap where existing sketches offer strong per-key confidence but fail to bound errors collectively across a large key set. It introduces ReliableSketch, a multi-layer, hash-based sketch built from Error-Sensible Buckets and a Double Exponential Control mechanism to ensure $\Pr[\forall e, |\hat f(e)-f(e)|\le Λ] \ge 1-Δ$ with space $O(\frac{N}{Λ}+\ln(\frac{1}{Δ}))$ and amortized time $O(1+Δ\ln\ln(\frac{N}{Λ}))$, while maintaining hardware-friendliness. The approach includes formal guarantees, mathematical analysis, FPGA and programmable-switch implementations, and extensive experiments showing zero outliers under small memory budgets and strong throughput compared to thousands of uncontrolled estimations. The work provides practical insights for real-world networks and data-centers, and the authors release public source code to accelerate adoption and reproducibility.
Abstract
To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is critically important in both algorithm theory and its practical applications. We propose ReliableSketch, the first to control the error of all keys to less than $Λ$ with a small failure probability $Δ$, requiring only $O(1 + Δ\ln\ln(\frac{N}Λ))$ amortized time and $O(\frac{N}Λ + \ln(\frac{1}Δ))$ space. Furthermore, its simplicity makes it hardware-friendly, and we implement it on CPU servers, FPGAs, and programmable switches. Our experiments show that under the same small space, ReliableSketch not only keeps all keys' errors below $Λ$ but also achieves near-optimal throughput, outperforming competitors with thousands of uncontrolled estimations. We have made our source code publicly available.
