Table of Contents
Fetching ...

LLM-Sketch: Enhancing Network Sketches with LLM

Yuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, Tong Yang

TL;DR

LLM-Sketch introduces a two-tier sketch architecture augmented by an LLM-based flow classifier that leverages full packet headers to predict whether arrivals will become large flows. By separating large and small flows into a heavy KV part and a light CMS part, and by applying a soft-label, header-informed prediction with a lock mechanism, it achieves significantly higher accuracy under tight memory budgets. Theoretical analysis bounds the end-to-end error given classifier accuracy and CMS collisions, while extensive experiments across CAIDA, MAWI, and IMC-DC demonstrate up to about 7.5× improvement in accuracy over state-of-the-art methods for flow size, heavy hitter, and hierarchical heavy hitter queries. These results, along with open-source code, indicate strong practical potential for adaptive, ML-assisted streaming sketches in real-world networks.

Abstract

Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a $7.5\times$ accuracy improvement.

LLM-Sketch: Enhancing Network Sketches with LLM

TL;DR

LLM-Sketch introduces a two-tier sketch architecture augmented by an LLM-based flow classifier that leverages full packet headers to predict whether arrivals will become large flows. By separating large and small flows into a heavy KV part and a light CMS part, and by applying a soft-label, header-informed prediction with a lock mechanism, it achieves significantly higher accuracy under tight memory budgets. Theoretical analysis bounds the end-to-end error given classifier accuracy and CMS collisions, while extensive experiments across CAIDA, MAWI, and IMC-DC demonstrate up to about 7.5× improvement in accuracy over state-of-the-art methods for flow size, heavy hitter, and hierarchical heavy hitter queries. These results, along with open-source code, indicate strong practical potential for adaptive, ML-assisted streaming sketches in real-world networks.

Abstract

Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a accuracy improvement.

Paper Structure

This paper contains 19 sections, 3 theorems, 7 equations, 13 figures.

Key Result

theorem 1

The probability that a large flow is fully accurate (i.e., tracked with zero error) in LLM-Sketch is where $A$ is the classifier's accuracy for large flows, i.e., the probability that a large flow is correctly identified as large. $w_{light}$, $d_{light}$ is the width and depth of the light part (CMS), and $N_{light}$ is the number of flows that end up in the light part. $P_{\mathrm{CMS}}(w, d, N

Figures (13)

  • Figure 1: The Count-Min sketch.
  • Figure 2: Workflow of LLM-Sketch.
  • Figure 3: An example of LLM-Sketch.
  • Figure 4: Accuracy vs. # bucket size.
  • Figure 5: Accuracy vs. heavy ratio.
  • ...and 8 more figures

Theorems & Definitions (7)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • theorem 1
  • theorem 2
  • theorem 3