Table of Contents
Fetching ...

Detecting Flow Gaps in Data Streams

Siyuan Dong, Yuxuan Tian, Wenhan Ma, Tong Yang, Chenye Zhang, Yuhan Wu, Kaicheng Yang, Yaojing Wang

TL;DR

This work defines a novel data-stream task: monitoring the variation of per-flow values to detect flow gaps. It introduces GapFilter, a Sketch-based framework with two variants—GapFilter-SO (speed-focused) and GapFilter-AO (accuracy-focused)—that leverage similarity absorption and a civilian-suspect mechanism to balance broad monitoring with targeted scrutiny. The authors provide theoretical guarantees on accuracy under memory constraints, and extensive CPU-based experiments on four real datasets show GapFilter achieving higher recall and F1 scores at substantially lower memory than a Straw-man baseline, with GapFilter-SO delivering the best throughput. The approach demonstrates practical impact for real-time QoS assurance and network anomaly detection, and the authors release open-source implementations for reproducibility and adoption.

Abstract

Data stream monitoring is a crucial task which has a wide range of applications. The majority of existing research in this area can be broadly classified into two types, monitoring value sum and monitoring value cardinality. In this paper, we define a third type, monitoring value variation, which can help us detect flow gaps in data streams. To realize this function, we propose GapFilter, leveraging the idea of Sketch for achieving speed and accuracy. To the best of our knowledge, this is the first work to detect flow gaps in data streams. Two key ideas of our work are the similarity absorption technique and the civilian-suspect mechanism. The similarity absorption technique helps in reducing memory usage and enhancing speed, while the civilian-suspect mechanism further boosts accuracy by organically integrating broad monitoring of overall flows with meticulous monitoring of suspicious flows.We have developed two versions of GapFilter. Speed-Oriented GapFilter (GapFilter-SO) emphasizes speed while maintaining satisfactory accuracy. Accuracy-Oriented GapFilter (GapFilter-AO) prioritizes accuracy while ensuring considerable speed. We provide a theoretical proof demonstrating that GapFilter secures high accuracy with minimal memory usage. Further, extensive experiments were conducted to assess the accuracy and speed of our algorithms. The results reveal that GapFilter-AO requires, on average, 1/32 of the memory to match the accuracy of the Straw-man solution. GapFilter-SO operates at a speed 3 times faster than the Straw-man solution. All associated source code has been open-sourced and is available on GitHub.

Detecting Flow Gaps in Data Streams

TL;DR

This work defines a novel data-stream task: monitoring the variation of per-flow values to detect flow gaps. It introduces GapFilter, a Sketch-based framework with two variants—GapFilter-SO (speed-focused) and GapFilter-AO (accuracy-focused)—that leverage similarity absorption and a civilian-suspect mechanism to balance broad monitoring with targeted scrutiny. The authors provide theoretical guarantees on accuracy under memory constraints, and extensive CPU-based experiments on four real datasets show GapFilter achieving higher recall and F1 scores at substantially lower memory than a Straw-man baseline, with GapFilter-SO delivering the best throughput. The approach demonstrates practical impact for real-time QoS assurance and network anomaly detection, and the authors release open-source implementations for reproducibility and adoption.

Abstract

Data stream monitoring is a crucial task which has a wide range of applications. The majority of existing research in this area can be broadly classified into two types, monitoring value sum and monitoring value cardinality. In this paper, we define a third type, monitoring value variation, which can help us detect flow gaps in data streams. To realize this function, we propose GapFilter, leveraging the idea of Sketch for achieving speed and accuracy. To the best of our knowledge, this is the first work to detect flow gaps in data streams. Two key ideas of our work are the similarity absorption technique and the civilian-suspect mechanism. The similarity absorption technique helps in reducing memory usage and enhancing speed, while the civilian-suspect mechanism further boosts accuracy by organically integrating broad monitoring of overall flows with meticulous monitoring of suspicious flows.We have developed two versions of GapFilter. Speed-Oriented GapFilter (GapFilter-SO) emphasizes speed while maintaining satisfactory accuracy. Accuracy-Oriented GapFilter (GapFilter-AO) prioritizes accuracy while ensuring considerable speed. We provide a theoretical proof demonstrating that GapFilter secures high accuracy with minimal memory usage. Further, extensive experiments were conducted to assess the accuracy and speed of our algorithms. The results reveal that GapFilter-AO requires, on average, 1/32 of the memory to match the accuracy of the Straw-man solution. GapFilter-SO operates at a speed 3 times faster than the Straw-man solution. All associated source code has been open-sourced and is available on GitHub.

Paper Structure

This paper contains 38 sections, 3 theorems, 1 equation, 13 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

We denote the probability of i flows distributed in i different sets $A_{j_1}, A_{j_2}, A_{j_3},\cdots, A_{j_i}$ as $P_{diff}(i)$. For $M<d$ we have $\quad P_{diff}(M)>(1-\frac{M}{d})^{M-d}e^{-M}$.

Figures (13)

  • Figure 1: Illustration of GapFilter-SO. This is a GapFilter-SO with $d=4$ buckets and each bucket has $w=4$ cells. $e_1$, $e_2$, $e_3$, $e_4$ arrive in order and are mapped to different buckets. The final result after the arrival of these four items lies on the right.
  • Figure 2: Monitoring operations in GapFilter-AO.
  • Figure 3: Concurrency circumstances of all/abnormal flows on different datasets.
  • Figure 4: Distributions of flow length and gap size.
  • Figure 5: Effect of $w$ on GapFilter-SO on different datasets.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Theorem 1