SQUID: Faster Analytics via Sampled Quantile Estimation
Ran Ben-Basat, Gil Einziger, Wenchen Han, Bilal Tayh
TL;DR
SQUID introduces a novel quantile-sampling approach to accelerate the $q$-MAX problem in streaming analytics, enabling faster maintenance by using approximate quantiles drawn from small samples rather than exact ones. The framework extends to weighted heavy hitters with SQUID-HH, leveraging a water-level implicit deletion scheme and Cuckoo hashing to achieve higher throughput and accuracy than prior art. It also demonstrates practical applicability to in-network caching on programmable switches (P4), including a hardware prototype that enables data-plane-based cache admission and eviction. Across software and hardware evaluations, SQUID achieves substantial speedups (up to 6.6x in some workloads) and better cache hit ratios compared to state-of-the-art baselines, while maintaining strong accuracy guarantees. The work offers open-source implementations and showcases the potential for bridging fast in-switch analytics with CPU-backed post-processing for diverse streaming tasks.
Abstract
Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is $q$-MAX, which finds the largest $q$ values in a number stream. Modern approaches attain a constant runtime by removing small items in bulk and retaining the largest $q$ items at all times. Yet, these approaches are bottlenecked by an expensive quantile calculation. This work introduces a quantile-sampling approach called SQUID and shows its benefits in multiple analytic tasks. Using this approach, we design a novel weighted heavy hitters data structure that is faster and more accurate than the existing alternatives. We also show SQUID's practicality for improving network-assisted caching systems with a hardware-based cache prototype that uses SQUID to implement the cache policy. The challenge here is that the switch's dataplane does not allow the general computation required to implement many cache policies, while its CPU is orders of magnitude slower. We overcome this issue by passing just SQUID's samples to the CPU, thus bridging this gap. In software implementations, we show that our method is up to 6.6x faster than the state-of-the-art alternatives when using real workloads. For switch-based caching, SQUID enables a wide spectrum of data-plane-based caching policies and achieves higher hit ratios than the state-of-the-art P4LRU.
