SQUID: Faster Analytics via Sampled Quantile Estimation

Ran Ben-Basat; Gil Einziger; Wenchen Han; Bilal Tayh

SQUID: Faster Analytics via Sampled Quantile Estimation

Ran Ben-Basat, Gil Einziger, Wenchen Han, Bilal Tayh

TL;DR

SQUID introduces a novel quantile-sampling approach to accelerate the $q$-MAX problem in streaming analytics, enabling faster maintenance by using approximate quantiles drawn from small samples rather than exact ones. The framework extends to weighted heavy hitters with SQUID-HH, leveraging a water-level implicit deletion scheme and Cuckoo hashing to achieve higher throughput and accuracy than prior art. It also demonstrates practical applicability to in-network caching on programmable switches (P4), including a hardware prototype that enables data-plane-based cache admission and eviction. Across software and hardware evaluations, SQUID achieves substantial speedups (up to 6.6x in some workloads) and better cache hit ratios compared to state-of-the-art baselines, while maintaining strong accuracy guarantees. The work offers open-source implementations and showcases the potential for bridging fast in-switch analytics with CPU-backed post-processing for diverse streaming tasks.

Abstract

Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is $q$-MAX, which finds the largest $q$ values in a number stream. Modern approaches attain a constant runtime by removing small items in bulk and retaining the largest $q$ items at all times. Yet, these approaches are bottlenecked by an expensive quantile calculation. This work introduces a quantile-sampling approach called SQUID and shows its benefits in multiple analytic tasks. Using this approach, we design a novel weighted heavy hitters data structure that is faster and more accurate than the existing alternatives. We also show SQUID's practicality for improving network-assisted caching systems with a hardware-based cache prototype that uses SQUID to implement the cache policy. The challenge here is that the switch's dataplane does not allow the general computation required to implement many cache policies, while its CPU is orders of magnitude slower. We overcome this issue by passing just SQUID's samples to the CPU, thus bridging this gap. In software implementations, we show that our method is up to 6.6x faster than the state-of-the-art alternatives when using real workloads. For switch-based caching, SQUID enables a wide spectrum of data-plane-based caching policies and achieves higher hit ratios than the state-of-the-art P4LRU.

SQUID: Faster Analytics via Sampled Quantile Estimation

TL;DR

SQUID introduces a novel quantile-sampling approach to accelerate the

-MAX problem in streaming analytics, enabling faster maintenance by using approximate quantiles drawn from small samples rather than exact ones. The framework extends to weighted heavy hitters with SQUID-HH, leveraging a water-level implicit deletion scheme and Cuckoo hashing to achieve higher throughput and accuracy than prior art. It also demonstrates practical applicability to in-network caching on programmable switches (P4), including a hardware prototype that enables data-plane-based cache admission and eviction. Across software and hardware evaluations, SQUID achieves substantial speedups (up to 6.6x in some workloads) and better cache hit ratios compared to state-of-the-art baselines, while maintaining strong accuracy guarantees. The work offers open-source implementations and showcases the potential for bridging fast in-switch analytics with CPU-backed post-processing for diverse streaming tasks.

Abstract

Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is

-MAX, which finds the largest

values in a number stream. Modern approaches attain a constant runtime by removing small items in bulk and retaining the largest

items at all times. Yet, these approaches are bottlenecked by an expensive quantile calculation. This work introduces a quantile-sampling approach called SQUID and shows its benefits in multiple analytic tasks. Using this approach, we design a novel weighted heavy hitters data structure that is faster and more accurate than the existing alternatives. We also show SQUID's practicality for improving network-assisted caching systems with a hardware-based cache prototype that uses SQUID to implement the cache policy. The challenge here is that the switch's dataplane does not allow the general computation required to implement many cache policies, while its CPU is orders of magnitude slower. We overcome this issue by passing just SQUID's samples to the CPU, thus bridging this gap. In software implementations, we show that our method is up to 6.6x faster than the state-of-the-art alternatives when using real workloads. For switch-based caching, SQUID enables a wide spectrum of data-plane-based caching policies and achieves higher hit ratios than the state-of-the-art P4LRU.

Paper Structure (28 sections, 1 theorem, 5 equations, 11 figures, 5 tables)

This paper contains 28 sections, 1 theorem, 5 equations, 11 figures, 5 tables.

Introduction
Background and formulation
Problem Formulations
Background - $q\hbox{-MAX}$
Background - Heavy Hitters
Algorithms
SQUID: Speeding up $q\hbox{-MAX}$
A Las Vegas Algorithm
Parameter Tuning
Deamortization
Faster Heavy Hitters with SQUID-HH
A Monte-Carlo HH Algorithm:
Score-based Caching on a P4 Switch
Evaluation
SQUID Evaluation
...and 13 more sections

Key Result

Theorem 1

Let $\alpha\in(1/2,1),\delta,\gamma\in(0,1)$ and denote $\eta = 1/4 \cdot \left((\alpha+1)^2 +(\alpha-1)\sqrt{\alpha^2 + 14 \alpha + 1}\right)$. Consider a set of numbers $S$ with $|S|=q(1+\gamma)$ elements and denote $k=\frac{2\alpha\ln(2/\delta)}{(1-\alpha)^2}$ and $Z=k\cdot \frac{1+\gamma}{\gamma

Figures (11)

Figure 1: A qualitative comparison of SQUID's and $q\hbox{-MAX}$: $q\hbox{-MAX}$ searches for the exact quantile, which is time-consuming, while SQUID settles on any reasonable quantile (in the yellow region). Such quantiles are much easier to obtain, resulting in shorter and more frequent maintenance operations.
Figure 2: An illustration of SQUID-HH. We use the water level technique to keep the load factor of the Cuckoo table within acceptable margins. In this example, the entries below the water level are pink and can be replaced. However, notice that all the tables' entries (even those logically deleted) contain valuable information. In this example, we report another occurrence of flow A with weight 3. According to SQUID-HH, if the flow does not have a counter, it is added with a frequency of $\mathcal{W}+3 = 17+3 =20$. In this example, we are lucky to find a logically deleted entry of $A$ with a frequency of $11$, which we increase to $14$. Thus, we would have a smaller approximation error for A's frequency.
Figure 3: Throughput of various applications when implemented using $q\hbox{-MAX}$ and SQUID.
Figure 4: (a)-(c) The Normalized Root Mean Square Error and (d)-(f) the throughput of $\mathit{SMED}$, $SMED$-$random$, $\mathit{ElasticSketch}$, and ${\sc SQUID}\xspace$-$HH$, $DIM$-${SUM}$ and ${IM}$-${SUM}$, varying the number of counters on packet traces.
Figure 5: Hit ratios of our P4 SQUID for in-network caching, compared against vanilla LRFU, LRU, P4-LRU P4LRU and Practical Packet Deflection (PPD) ppd. Both SQUID + Original LRFU and SQUID + P4 (approximated) LRFU deploy an array-of-buckets structure, while Vanilla LRFU maintains a heap to evict global LRFU items.
...and 6 more figures

Theorems & Definitions (1)

Theorem 1

SQUID: Faster Analytics via Sampled Quantile Estimation

TL;DR

Abstract

SQUID: Faster Analytics via Sampled Quantile Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (1)