Learning-Augmented Frequency Estimation in Sliding Windows
Rana Shahout, Ibrahim Sabek, Michael Mitzenmacher
TL;DR
This work tackles approximate frequency estimation over sliding windows by introducing a learning-augmented approach, LWCSS, which augments Window Compact Space Saving with a next-arrival predictor to filter out single-occurrence items within a frame. The predictor is treated as a black box (binary classification via an RNN with IP embeddings), and a Bloom filter is used to guard against mispredictions, yielding a robustness guarantee: if an underlying algorithm is $(W,\varepsilon-\frac{2}{W})$-accurate, then LWCSS achieves $(W,\varepsilon)$-accuracy with at most two undercounts per window. The paper provides both theoretical robustness results and extensive empirical evaluation on real CAIDA traces and synthetic Zipf data, showing memory-accuracy tradeoffs improve over the baseline WCSS. It also outlines future directions, including frame-based frequency prediction with transfer learning and a broader learned sliding-window framework for other query types, illustrating the potential of predictions to enhance sliding-window algorithms in practice.
Abstract
We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous learning-augmented algorithms are less effective, since properties in sliding window resolution can differ significantly from the properties of the entire stream. Our focus is on the benefits of predicting and filtering out items with large next arrival times -- that is, there is a large gap until their next appearance -- from the stream, which we show improves the memory-accuracy tradeoffs significantly. We provide theorems that provide insight into how and by how much our technique can improve the sliding window algorithm, as well as experimental results using real-world data sets. Our work demonstrates that predictors can be useful in the challenging sliding window setting.
