Talk Like a Packet: Rethinking Network Traffic Analysis with Transformer Foundation Models
Samara Mayhoub, Chuan Heng Foh, Mahdi Boloursaz Mashhadi, Mohammad Shojafar, Rahim Tafazolli
TL;DR
This work introduces Transformer-based traffic foundation models that are pre-trained on unlabeled network traffic and fine-tuned for downstream tasks such as traffic classification, traffic characteristic prediction, and traffic generation. It presents a unified pre-training and fine-tuning pipeline and a taxonomy of architectures (encoder-only, MAE-based, encoder–decoder, decoder-only, and hybrids) to learn rich traffic representations. The paper demonstrates generalization across three task families, comparing favorably to non-foundation baselines, and discusses datasets, representation modalities, and structure-aware encoding strategies. It also outlines future research directions in computational efficiency, latency, explainability, and reasoning, highlighting the potential of foundation models for scalable and data-efficient intelligent network analysis.
Abstract
Inspired by the success of Transformer-based models in natural language processing, this paper investigates their potential as foundation models for network traffic analysis. We propose a unified pre-training and fine-tuning pipeline for traffic foundation models. Through fine-tuning, we demonstrate the generalizability of the traffic foundation models in various downstream tasks, including traffic classification, traffic characteristic prediction, and traffic generation. We also compare against non-foundation baselines, demonstrating that the foundation-model backbones achieve improved performance. Moreover, we categorize existing models based on their architecture, input modality, and pre-training strategy. Our findings show that these models can effectively learn traffic representations and perform well with limited labeled datasets, highlighting their potential in future intelligent network analysis systems.
