Table of Contents
Fetching ...

Learning-Augmented Frequency Estimation in Sliding Windows

Rana Shahout, Ibrahim Sabek, Michael Mitzenmacher

TL;DR

This work tackles approximate frequency estimation over sliding windows by introducing a learning-augmented approach, LWCSS, which augments Window Compact Space Saving with a next-arrival predictor to filter out single-occurrence items within a frame. The predictor is treated as a black box (binary classification via an RNN with IP embeddings), and a Bloom filter is used to guard against mispredictions, yielding a robustness guarantee: if an underlying algorithm is $(W,\varepsilon-\frac{2}{W})$-accurate, then LWCSS achieves $(W,\varepsilon)$-accuracy with at most two undercounts per window. The paper provides both theoretical robustness results and extensive empirical evaluation on real CAIDA traces and synthetic Zipf data, showing memory-accuracy tradeoffs improve over the baseline WCSS. It also outlines future directions, including frame-based frequency prediction with transfer learning and a broader learned sliding-window framework for other query types, illustrating the potential of predictions to enhance sliding-window algorithms in practice.

Abstract

We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous learning-augmented algorithms are less effective, since properties in sliding window resolution can differ significantly from the properties of the entire stream. Our focus is on the benefits of predicting and filtering out items with large next arrival times -- that is, there is a large gap until their next appearance -- from the stream, which we show improves the memory-accuracy tradeoffs significantly. We provide theorems that provide insight into how and by how much our technique can improve the sliding window algorithm, as well as experimental results using real-world data sets. Our work demonstrates that predictors can be useful in the challenging sliding window setting.

Learning-Augmented Frequency Estimation in Sliding Windows

TL;DR

This work tackles approximate frequency estimation over sliding windows by introducing a learning-augmented approach, LWCSS, which augments Window Compact Space Saving with a next-arrival predictor to filter out single-occurrence items within a frame. The predictor is treated as a black box (binary classification via an RNN with IP embeddings), and a Bloom filter is used to guard against mispredictions, yielding a robustness guarantee: if an underlying algorithm is -accurate, then LWCSS achieves -accuracy with at most two undercounts per window. The paper provides both theoretical robustness results and extensive empirical evaluation on real CAIDA traces and synthetic Zipf data, showing memory-accuracy tradeoffs improve over the baseline WCSS. It also outlines future directions, including frame-based frequency prediction with transfer learning and a broader learned sliding-window framework for other query types, illustrating the potential of predictions to enhance sliding-window algorithms in practice.

Abstract

We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous learning-augmented algorithms are less effective, since properties in sliding window resolution can differ significantly from the properties of the entire stream. Our focus is on the benefits of predicting and filtering out items with large next arrival times -- that is, there is a large gap until their next appearance -- from the stream, which we show improves the memory-accuracy tradeoffs significantly. We provide theorems that provide insight into how and by how much our technique can improve the sliding window algorithm, as well as experimental results using real-world data sets. Our work demonstrates that predictors can be useful in the challenging sliding window setting.
Paper Structure (15 sections, 1 theorem, 4 equations, 5 figures, 2 tables)

This paper contains 15 sections, 1 theorem, 4 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Let $\mathcal{A}$ be an algorithm for $(W,\varepsilon - \frac{2}{W})$-WFrequency. Then LWCSS solves ${(W,\varepsilon)}$-WFrequency.

Figures (5)

  • Figure 1: WCSS algorithm overview (adapting a figure in ben2016heavy)
  • Figure 2: Next arrival prediction in sliding window setting.
  • Figure 3: Average single items ratio vs. frame size using real-world traces (described in Section \ref{['sec:eval']}).
  • Figure 4: Accuracy comparison of WCSS and LWCSS vs. Memory (Megabytes) using real datasets (Chicago, NY and SJ)
  • Figure 5: Query and update performance using Chicago dataset and setting $W=2^{13}$.

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof