Table of Contents
Fetching ...

Relative Error Streaming Quantiles with Seamless Mergeability via Adaptive Compactors

Tomáš Domes, Pavel Veselý

TL;DR

The paper addresses the challenge of producing fully mergeable, relative-error quantile summaries with tight space. It introduces adaptive compactors as a refinement of ReqSketch, enabling a simpler mergeability proof while preserving space, update time, and relative-error guarantees; the authors also derive near-optimal space bounds in special cases such as balanced merging and reverse-sorted inputs. The analysis leverages a potential function to bound space and a concentration-based argument to guarantee error, and it demonstrates improved space behavior under structured merge trees. The work advances practical distributed quantile estimation by delivering a mergeable, near-optimal, and analyzable sketch suitable for sensor networks and multi-machine systems.

Abstract

Quantile summaries provide a scalable way to estimate the distribution of individual attributes in large datasets that are often distributed across multiple machines or generated by sensor networks. ReqSketch (arXiv:2004.01668) is currently the most space-efficient summary with two key properties: relative error guarantees, offering increasingly higher accuracy towards the distribution's tails, and mergeability, allowing distributed or parallel processing of datasets. Due to these features and its simple algorithm design, ReqSketch has been adopted in practice, via implementation in the Apache DataSketches library. However, the proof of mergeability in ReqSketch is overly complicated, requiring an intricate charging argument and complex variance analysis. In this paper, we provide a refined version of ReqSketch, by developing so-called adaptive compactors. This enables a significantly simplified proof of relative error guarantees in the most general mergeability setting, while retaining the original space bound, update time, and algorithmic simplicity. Moreover, the adaptivity of our sketch, together with the proof technique, yields near-optimal space bounds in specific scenarios - particularly when merging sketches of comparable size.

Relative Error Streaming Quantiles with Seamless Mergeability via Adaptive Compactors

TL;DR

The paper addresses the challenge of producing fully mergeable, relative-error quantile summaries with tight space. It introduces adaptive compactors as a refinement of ReqSketch, enabling a simpler mergeability proof while preserving space, update time, and relative-error guarantees; the authors also derive near-optimal space bounds in special cases such as balanced merging and reverse-sorted inputs. The analysis leverages a potential function to bound space and a concentration-based argument to guarantee error, and it demonstrates improved space behavior under structured merge trees. The work advances practical distributed quantile estimation by delivering a mergeable, near-optimal, and analyzable sketch suitable for sensor networks and multi-machine systems.

Abstract

Quantile summaries provide a scalable way to estimate the distribution of individual attributes in large datasets that are often distributed across multiple machines or generated by sensor networks. ReqSketch (arXiv:2004.01668) is currently the most space-efficient summary with two key properties: relative error guarantees, offering increasingly higher accuracy towards the distribution's tails, and mergeability, allowing distributed or parallel processing of datasets. Due to these features and its simple algorithm design, ReqSketch has been adopted in practice, via implementation in the Apache DataSketches library. However, the proof of mergeability in ReqSketch is overly complicated, requiring an intricate charging argument and complex variance analysis. In this paper, we provide a refined version of ReqSketch, by developing so-called adaptive compactors. This enables a significantly simplified proof of relative error guarantees in the most general mergeability setting, while retaining the original space bound, update time, and algorithmic simplicity. Moreover, the adaptivity of our sketch, together with the proof technique, yields near-optimal space bounds in specific scenarios - particularly when merging sketches of comparable size.

Paper Structure

This paper contains 22 sections, 6 theorems, 41 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

For any parameters $0 < \delta < 1/8$ and $0 < \varepsilon < 1$ there is a randomized, comparison-based, fully mergeable streaming algorithm that, when processing an input consisting of $N$ items from a totally-ordered universe $\mathcal{U}$, produces a summary $S$ satisfying the following property. where the probability is over the internal randomness of the algorithm. The size of $S$ is

Figures (3)

  • Figure 1: The compaction operation of relative and adaptive compactors for a compactor that overflows its capacity (dashed items). Items are first sorted from the largest to the smallest. The compaction evicts the crossed items from the memory and "promotes" the items with arrows to the next level, while the gray ones are not involved in the compaction, so they remain in the buffer. The size $T$ of the compacted part is, however, computed differently in relative and adaptive compactors.
  • Figure 2: The compaction operation in a relative compactor with capacity $C = 40$, where $P$ is the number of already performed compactions (written in binary), and $Z$ equals the number of trailing ones of $P$.
  • Figure 3: Sections of a buffer with $K = 4$ and $C = 32$

Theorems & Definitions (21)

  • Definition 1: Marking
  • Definition 2: Canonical marking
  • proof
  • proof
  • Remark : Time complexity
  • Theorem 1
  • Remark
  • Lemma 1: Lower bound on $K$
  • proof
  • Lemma 2: The space bound
  • ...and 11 more