Table of Contents
Fetching ...

Cardinality is Not Enough: Super Host Detection via Segmented Cardinality Estimation

Yilin Zhao, Jiawei Huang, Xianshi Su, Weihe Li, Xin Li, Yan Liu, Jiacheng Xie, Qichen Su, Jin Ye, Wanchun Jiang, Jianxin Wang

Abstract

Accurately detecting super host that establishes connections to a large number of distinct peers is significant for mitigating web attacks and ensuring high quality of web service. Existing sketch-based approaches estimate the number of distinct connections called flow cardinality according to full IP addresses, while ignoring the fact that a malicious or victim super host often communicates with hosts within the same subnet, resulting in high false positive rates and low accuracy. Though hierarchical-structure based approaches could capture flow cardinality in subnet, they inherently suffer from high memory usage. To address these limitations, we propose SegSketch, a segmented cardinality estimation approach that employs a lightweight halved-segment hashing strategy to infer common prefix lengths of IP addresses, and estimates cardinality within subnet to enhance detection accuracy under constrained memory size. Experiments driven by real-world traces demonstrate that, SegSketch improves F1-Score by up to 8.04x compared to state-of-the-art solutions, particularly under small memory budgets.

Cardinality is Not Enough: Super Host Detection via Segmented Cardinality Estimation

Abstract

Accurately detecting super host that establishes connections to a large number of distinct peers is significant for mitigating web attacks and ensuring high quality of web service. Existing sketch-based approaches estimate the number of distinct connections called flow cardinality according to full IP addresses, while ignoring the fact that a malicious or victim super host often communicates with hosts within the same subnet, resulting in high false positive rates and low accuracy. Though hierarchical-structure based approaches could capture flow cardinality in subnet, they inherently suffer from high memory usage. To address these limitations, we propose SegSketch, a segmented cardinality estimation approach that employs a lightweight halved-segment hashing strategy to infer common prefix lengths of IP addresses, and estimates cardinality within subnet to enhance detection accuracy under constrained memory size. Experiments driven by real-world traces demonstrate that, SegSketch improves F1-Score by up to 8.04x compared to state-of-the-art solutions, particularly under small memory budgets.

Paper Structure

This paper contains 29 sections, 2 theorems, 4 equations, 17 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

For any host $x$ connecting to a targeted subnet, the error between its true and estimated subnet cardinality $C(x)$ and $\hat{C}(x)$ may exceed the expected error $\varepsilon$ introduced by the Linear Counting algorithm. The probability of this event is bounded as follows: where $\varepsilon$, $M$ and $U$ are three hash-strategy-dependent variables that satisfy Table tab:math. In Table tab:math

Figures (17)

  • Figure 1: Sketch-based flow cardinality estimation approach.
  • Figure 2: Hierarchical approach for cardinality estimation.
  • Figure 3: Super spreader detection performance.
  • Figure 4: Data structure.
  • Figure 5: Example of common prefix length estimation using halved-segment hashing with IP segment width $G = 8$ bits.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2