Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
Rana Shahout, Michael Mitzenmacher
TL;DR
The paper tackles the challenge of identifying heavy hitters and estimating flow frequencies in streams under tight memory. It introduces Learned Space Saving (LSS), the first learned competing-counter-based approach that augments the Space Saving algorithm with two predictors: one for low-frequency items (LSS-LF) and one for heavy hitters (LSS-HH), plus a higher-throughput variant LSS+. LSS employs a Counting Bloom Filter and a fixed-vs-mutable counter split to maintain robustness against prediction errors, yielding up to substantial gains in top-k precision, heavy-hitter recall, and RMSE for frequency estimation across synthetic, CAIDA IP, and AOL Web data. The framework is supported by theoretical robustness guarantees and extensive experiments, showing LSS can outperform Space Saving under realistic conditions and configurations. The work provides practical guidance for deploying learning-augmented frequency-estimation schemes in high-speed networks and similar streaming contexts, with implications for memory-efficient measurement and detection tasks.
Abstract
Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based. The Count-Min sketch is a standard example of a hashing-based algorithm, and the Space Saving algorithm is an example of a competing-counter algorithm. Recent works have explored the use of machine learning to enhance algorithms for frequency estimation problems, under the algorithms with prediction framework. However, these works have focused solely on the hashing-based approach, which may not be best for identifying heavy hitters. In this paper, we present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation that utilizes the well-known Space Saving algorithm. We provide theoretical insights into how and to what extent our approach can improve upon Space Saving, backed by experimental results on both synthetic and real-world datasets. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.
