Lens: A Knowledge-Guided Foundation Model for Network Traffic
Xiaochang Li, Chen Qian, Qineng Wang, Jiangtao Kong, Yuchen Wang, Ziyu Yao, Bo Ji, Long Cheng, Gang Zhou, Huajie Shao
TL;DR
Lens introduces a knowledge-guided foundation model for network traffic that combines KG-MSP pretraining with textual context and context-aware finetuning to address semantic gaps and distribution shifts. Built on an encoder-decoder Transformer (T5), it tokenizes traffic using a network-aware BBPE vocabulary and masks critical metadata to learn robust representations. The approach achieves strong results across 12 classification tasks (average about 96.3% accuracy) and 5 generation tasks, with notable gains in extending to novel classes and in generating high-fidelity traffic for network simulation and fuzzing. This work offers practical benefits for security and management and plans to open-source the code upon publication, enabling broader adoption and evaluation.
Abstract
Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, yet remains challenging due to the heterogeneity of plain-text packet headers and encrypted payloads. To capture the latent semantics of traffic, recent studies have adopted Transformer-based pretraining techniques to learn network representations from massive traffic data. However, these methods pre-train on data-driven tasks but overlook network knowledge, such as masking partial digits of the indivisible network port numbers for prediction, thereby limiting semantic understanding. In addition, they struggle to extend classification to new classes during fine-tuning due to the distribution shift. Motivated by these limitations, we propose \Lens, a unified knowledge-guided foundation model for both network traffic classification and generation. In pretraining, we propose a Knowledge-Guided Mask Span Prediction method with textual context for learning knowledge-enriched representations. For extending to new classes in finetuning, we reframe the traffic classification as a closed-ended generation task and introduce context-aware finetuning to adapt to the distribution shift. Evaluation results across various benchmark datasets demonstrate that the proposed Lens~achieves superior performance on both classification and generation tasks. For traffic classification, Lens~outperforms competitive baselines substantially on 8 out of 12 tasks with an average accuracy of \textbf{96.33\%} and extends to novel classes with significantly better performance. For traffic generation, Lens~generates better high-fidelity network traffic for network simulation, gaining up to \textbf{30.46\%} and \textbf{33.3\%} better accuracy and F1 in fuzzing tests. We will open-source the code upon publication.
