Table of Contents
Fetching ...

Lens: A Knowledge-Guided Foundation Model for Network Traffic

Xiaochang Li, Chen Qian, Qineng Wang, Jiangtao Kong, Yuchen Wang, Ziyu Yao, Bo Ji, Long Cheng, Gang Zhou, Huajie Shao

TL;DR

Lens introduces a knowledge-guided foundation model for network traffic that combines KG-MSP pretraining with textual context and context-aware finetuning to address semantic gaps and distribution shifts. Built on an encoder-decoder Transformer (T5), it tokenizes traffic using a network-aware BBPE vocabulary and masks critical metadata to learn robust representations. The approach achieves strong results across 12 classification tasks (average about 96.3% accuracy) and 5 generation tasks, with notable gains in extending to novel classes and in generating high-fidelity traffic for network simulation and fuzzing. This work offers practical benefits for security and management and plans to open-source the code upon publication, enabling broader adoption and evaluation.

Abstract

Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, yet remains challenging due to the heterogeneity of plain-text packet headers and encrypted payloads. To capture the latent semantics of traffic, recent studies have adopted Transformer-based pretraining techniques to learn network representations from massive traffic data. However, these methods pre-train on data-driven tasks but overlook network knowledge, such as masking partial digits of the indivisible network port numbers for prediction, thereby limiting semantic understanding. In addition, they struggle to extend classification to new classes during fine-tuning due to the distribution shift. Motivated by these limitations, we propose \Lens, a unified knowledge-guided foundation model for both network traffic classification and generation. In pretraining, we propose a Knowledge-Guided Mask Span Prediction method with textual context for learning knowledge-enriched representations. For extending to new classes in finetuning, we reframe the traffic classification as a closed-ended generation task and introduce context-aware finetuning to adapt to the distribution shift. Evaluation results across various benchmark datasets demonstrate that the proposed Lens~achieves superior performance on both classification and generation tasks. For traffic classification, Lens~outperforms competitive baselines substantially on 8 out of 12 tasks with an average accuracy of \textbf{96.33\%} and extends to novel classes with significantly better performance. For traffic generation, Lens~generates better high-fidelity network traffic for network simulation, gaining up to \textbf{30.46\%} and \textbf{33.3\%} better accuracy and F1 in fuzzing tests. We will open-source the code upon publication.

Lens: A Knowledge-Guided Foundation Model for Network Traffic

TL;DR

Lens introduces a knowledge-guided foundation model for network traffic that combines KG-MSP pretraining with textual context and context-aware finetuning to address semantic gaps and distribution shifts. Built on an encoder-decoder Transformer (T5), it tokenizes traffic using a network-aware BBPE vocabulary and masks critical metadata to learn robust representations. The approach achieves strong results across 12 classification tasks (average about 96.3% accuracy) and 5 generation tasks, with notable gains in extending to novel classes and in generating high-fidelity traffic for network simulation and fuzzing. This work offers practical benefits for security and management and plans to open-source the code upon publication, enabling broader adoption and evaluation.

Abstract

Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, yet remains challenging due to the heterogeneity of plain-text packet headers and encrypted payloads. To capture the latent semantics of traffic, recent studies have adopted Transformer-based pretraining techniques to learn network representations from massive traffic data. However, these methods pre-train on data-driven tasks but overlook network knowledge, such as masking partial digits of the indivisible network port numbers for prediction, thereby limiting semantic understanding. In addition, they struggle to extend classification to new classes during fine-tuning due to the distribution shift. Motivated by these limitations, we propose \Lens, a unified knowledge-guided foundation model for both network traffic classification and generation. In pretraining, we propose a Knowledge-Guided Mask Span Prediction method with textual context for learning knowledge-enriched representations. For extending to new classes in finetuning, we reframe the traffic classification as a closed-ended generation task and introduce context-aware finetuning to adapt to the distribution shift. Evaluation results across various benchmark datasets demonstrate that the proposed Lens~achieves superior performance on both classification and generation tasks. For traffic classification, Lens~outperforms competitive baselines substantially on 8 out of 12 tasks with an average accuracy of \textbf{96.33\%} and extends to novel classes with significantly better performance. For traffic generation, Lens~generates better high-fidelity network traffic for network simulation, gaining up to \textbf{30.46\%} and \textbf{33.3\%} better accuracy and F1 in fuzzing tests. We will open-source the code upon publication.
Paper Structure (26 sections, 1 equation, 4 figures, 17 tables)

This paper contains 26 sections, 1 equation, 4 figures, 17 tables.

Figures (4)

  • Figure 1: The overall framework of Lens. (a) Network flows are extracted, parsed with Tshark, anonymized, and tokenized using our network-specific tokenizer. (b) Lens is pretrained with Knowledge-Guided Masked Span Prediction (KG-MSP) and auxiliary natural-language context. (c) In finetuning, Lens performs downstream classification and generation tasks via context-aware finetuning.
  • Figure 2: The core model architecture of Lens for pre-training with both the encoder and decoder. 1) The encoder takes in masked network traffic (header and payload) and textual template context. 2) The decoder uncovers the masked span tokens in both traffic and context based on Encoder representations in an auto-regressive way.
  • Figure 3: Fuzzing performance on IoT attack detection. Machine-learning models trained on Lens-generated traffic achieve consistently higher accuracy and F1 than those trained on baselines’ generated traffic.
  • Figure 4: The example input of classification and generation. For classification, the input includes parsed network traffic and a task context listing label options. For generation, the input contains a masked packet and a task description.