MagnifierSketch: Quantile Estimation Centered at One Point
Jiarui Guo, Qiushi Lyu, Yuhan Wu, Haoyu Li, Zhaoqian Yao, Yuqi Dong, Xiaolin Wang, Bin Cui, Tong Yang
TL;DR
This work tackles per-key point-quantile estimation in data streams by introducing MagnifierSketch, a two-stage sketch combining Tower Sketch for infrequent-item filtering with a per-key Value Sketch that uses Value Focus, Distribution Calibration, and Double Filtration. The authors prove unbiasedness under distribution calibration, derive single-key and per-key error bounds, and analyze time/space complexity, demonstrating strong improvements in average error and throughput. Extensive experiments across real and synthetic data, plus RocksDB integration, show MagnifierSketch outperforms state-of-the-art baselines in both single-key and per-key settings while remaining efficient for high-speed streams. The approach offers practical, cache-friendly quantile estimation with real-world applicability in databases and network monitoring.
Abstract
In this paper, we take into consideration quantile estimation in data stream models, where every item in the data stream is a key-value pair. Researchers sometimes aim to estimate per-key quantiles (i.e. quantile estimation for every distinct key), and some popular use cases, such as tail latency measurement, recline on a predefined single quantile (e.g. 0.95- or 0.99- quantile) rather than demanding arbitrary quantile estimation. However, existing algorithms are not specially designed for per-key estimation centered at one point. They cannot achieve high accuracy in our problem setting, and their throughput are not satisfactory to handle high-speed items in data streams. To solve this problem, we propose MagnifierSketch for point-quantile estimation. MagnifierSketch supports both single-key and per-key quantile estimation, and its key techniques are named Value Focus, Distribution Calibration and Double Filtration. We provide strict mathematical derivations to prove the unbiasedness of MagnifierSketch and show its space and time complexity. Our experimental results show that the Average Error (AE) of MagnifierSketch is significantly lower than the state-of-the-art in both single-key and per-key situations. We also implement MagnifierSketch on RocksDB database to reduce quantile query latency in real databases. All related codes of MagnifierSketch are open-sourced and available at GitHub.
