Relative Error Streaming Quantiles with Seamless Mergeability via Adaptive Compactors
Tomáš Domes, Pavel Veselý
TL;DR
The paper addresses the challenge of producing fully mergeable, relative-error quantile summaries with tight space. It introduces adaptive compactors as a refinement of ReqSketch, enabling a simpler mergeability proof while preserving space, update time, and relative-error guarantees; the authors also derive near-optimal space bounds in special cases such as balanced merging and reverse-sorted inputs. The analysis leverages a potential function to bound space and a concentration-based argument to guarantee error, and it demonstrates improved space behavior under structured merge trees. The work advances practical distributed quantile estimation by delivering a mergeable, near-optimal, and analyzable sketch suitable for sensor networks and multi-machine systems.
Abstract
Quantile summaries provide a scalable way to estimate the distribution of individual attributes in large datasets that are often distributed across multiple machines or generated by sensor networks. ReqSketch (arXiv:2004.01668) is currently the most space-efficient summary with two key properties: relative error guarantees, offering increasingly higher accuracy towards the distribution's tails, and mergeability, allowing distributed or parallel processing of datasets. Due to these features and its simple algorithm design, ReqSketch has been adopted in practice, via implementation in the Apache DataSketches library. However, the proof of mergeability in ReqSketch is overly complicated, requiring an intricate charging argument and complex variance analysis. In this paper, we provide a refined version of ReqSketch, by developing so-called adaptive compactors. This enables a significantly simplified proof of relative error guarantees in the most general mergeability setting, while retaining the original space bound, update time, and algorithmic simplicity. Moreover, the adaptivity of our sketch, together with the proof technique, yields near-optimal space bounds in specific scenarios - particularly when merging sketches of comparable size.
