Table of Contents
Fetching ...

Feasibility of State Space Models for Network Traffic Generation

Andrew Chu, Xi Jiang, Shinan Liu, Arjun Bhagoji, Francesco Bronzino, Paul Schmitt, Nick Feamster

TL;DR

The paper tackles the scarcity and limited realism of real network traces by proposing a Mamba-based selective-structured state-space model to generate packet-level synthetic traces treated as unsupervised sequence generation. It argues that SSMs offer linear scaling and longer-context capabilities over transformers and diffusion-based methods, enabling long traces and finer-grained, byte-level fidelity. Through tokenized PCAP flows and seed-conditioned generation, the approach achieves higher statistical similarity to real traffic than state-of-the-art baselines and can produce traces exceeding $1000$ packets, with relatively low memorization. The work also outlines open challenges, including targeted generation, temporal information integration, and robust evaluation for downstream networking tasks, providing a practical path toward realistic, privacy-preserving synthetic data for networking research and deployment.

Abstract

Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of real-world traffic In this paper, we i) survey the evolution of traffic simulators/generators and ii) propose the use of state-space models, specifically Mamba, for packet-level, synthetic network trace generation by modeling it as an unsupervised sequence generation problem. Early evaluation shows that state-space models can generate synthetic network traffic with higher statistical similarity to real traffic than the state-of-the-art. Our approach thus has the potential to reliably generate realistic, informative synthetic network traces for downstream tasks.

Feasibility of State Space Models for Network Traffic Generation

TL;DR

The paper tackles the scarcity and limited realism of real network traces by proposing a Mamba-based selective-structured state-space model to generate packet-level synthetic traces treated as unsupervised sequence generation. It argues that SSMs offer linear scaling and longer-context capabilities over transformers and diffusion-based methods, enabling long traces and finer-grained, byte-level fidelity. Through tokenized PCAP flows and seed-conditioned generation, the approach achieves higher statistical similarity to real traffic than state-of-the-art baselines and can produce traces exceeding packets, with relatively low memorization. The work also outlines open challenges, including targeted generation, temporal information integration, and robust evaluation for downstream networking tasks, providing a practical path toward realistic, privacy-preserving synthetic data for networking research and deployment.

Abstract

Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of real-world traffic In this paper, we i) survey the evolution of traffic simulators/generators and ii) propose the use of state-space models, specifically Mamba, for packet-level, synthetic network trace generation by modeling it as an unsupervised sequence generation problem. Early evaluation shows that state-space models can generate synthetic network traffic with higher statistical similarity to real traffic than the state-of-the-art. Our approach thus has the potential to reliably generate realistic, informative synthetic network traces for downstream tasks.
Paper Structure (34 sections, 3 figures, 4 tables)

This paper contains 34 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Timeline of synthetic network trace generation methods.
  • Figure 1: Architecture for the Mamba block, described in Section \ref{['sec:architecture']}. $\sigma$ denotes the SiLU/Swish non-linear activation. $\otimes$ denotes element-wise multiply.
  • Figure 2: Model training and generation process.