LLM-Sketch: Enhancing Network Sketches with LLM
Yuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, Tong Yang
TL;DR
LLM-Sketch introduces a two-tier sketch architecture augmented by an LLM-based flow classifier that leverages full packet headers to predict whether arrivals will become large flows. By separating large and small flows into a heavy KV part and a light CMS part, and by applying a soft-label, header-informed prediction with a lock mechanism, it achieves significantly higher accuracy under tight memory budgets. Theoretical analysis bounds the end-to-end error given classifier accuracy and CMS collisions, while extensive experiments across CAIDA, MAWI, and IMC-DC demonstrate up to about 7.5× improvement in accuracy over state-of-the-art methods for flow size, heavy hitter, and hierarchical heavy hitter queries. These results, along with open-source code, indicate strong practical potential for adaptive, ML-assisted streaming sketches in real-world networks.
Abstract
Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a $7.5\times$ accuracy improvement.
