Table of Contents
Fetching ...

MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification

Xu-Yang Chen, Lu Han, De-Chuan Zhan, Han-Jia Ye

TL;DR

MIETT tackles encrypted traffic classification by moving beyond token-level analysis to capture flow-level dynamics. It introduces a Multi-Instance Encrypted Traffic Transformer with Two-Level Attention that models intra-packet and inter-packet dependencies, aided by novel pre-training tasks (Packet Relative Position Prediction and Flow Contrastive Learning) along with Masked Flow Prediction. The model leverages a frozen packet-attention backbone from prior work and learns robust flow representations through per-flow CLS tokens, achieving state-of-the-art results across five datasets. This approach improves generalization to unseen traffic and offers a scalable, flow-aware framework for encrypted traffic analysis with practical implications for security and network management.

Abstract

Network traffic includes data transmitted across a network, such as web browsing and file transfers, and is organized into packets (small units of data) and flows (sequences of packets exchanged between two endpoints). Classifying encrypted traffic is essential for detecting security threats and optimizing network management. Recent advancements have highlighted the superiority of foundation models in this task, particularly for their ability to leverage large amounts of unlabeled data and demonstrate strong generalization to unseen data. However, existing methods that focus on token-level relationships fail to capture broader flow patterns, as tokens, defined as sequences of hexadecimal digits, typically carry limited semantic information in encrypted traffic. These flow patterns, which are crucial for traffic classification, arise from the interactions between packets within a flow, not just their internal structure. To address this limitation, we propose a Multi-Instance Encrypted Traffic Transformer (MIETT), which adopts a multi-instance approach where each packet is treated as a distinct instance within a larger bag representing the entire flow. This enables the model to capture both token-level and packet-level relationships more effectively through Two-Level Attention (TLA) layers, improving the model's ability to learn complex packet dynamics and flow patterns. We further enhance the model's understanding of temporal and flow-specific dynamics by introducing two novel pre-training tasks: Packet Relative Position Prediction (PRPP) and Flow Contrastive Learning (FCL). After fine-tuning, MIETT achieves state-of-the-art (SOTA) results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors. Code is available at \url{https://github.com/Secilia-Cxy/MIETT}.

MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification

TL;DR

MIETT tackles encrypted traffic classification by moving beyond token-level analysis to capture flow-level dynamics. It introduces a Multi-Instance Encrypted Traffic Transformer with Two-Level Attention that models intra-packet and inter-packet dependencies, aided by novel pre-training tasks (Packet Relative Position Prediction and Flow Contrastive Learning) along with Masked Flow Prediction. The model leverages a frozen packet-attention backbone from prior work and learns robust flow representations through per-flow CLS tokens, achieving state-of-the-art results across five datasets. This approach improves generalization to unseen traffic and offers a scalable, flow-aware framework for encrypted traffic analysis with practical implications for security and network management.

Abstract

Network traffic includes data transmitted across a network, such as web browsing and file transfers, and is organized into packets (small units of data) and flows (sequences of packets exchanged between two endpoints). Classifying encrypted traffic is essential for detecting security threats and optimizing network management. Recent advancements have highlighted the superiority of foundation models in this task, particularly for their ability to leverage large amounts of unlabeled data and demonstrate strong generalization to unseen data. However, existing methods that focus on token-level relationships fail to capture broader flow patterns, as tokens, defined as sequences of hexadecimal digits, typically carry limited semantic information in encrypted traffic. These flow patterns, which are crucial for traffic classification, arise from the interactions between packets within a flow, not just their internal structure. To address this limitation, we propose a Multi-Instance Encrypted Traffic Transformer (MIETT), which adopts a multi-instance approach where each packet is treated as a distinct instance within a larger bag representing the entire flow. This enables the model to capture both token-level and packet-level relationships more effectively through Two-Level Attention (TLA) layers, improving the model's ability to learn complex packet dynamics and flow patterns. We further enhance the model's understanding of temporal and flow-specific dynamics by introducing two novel pre-training tasks: Packet Relative Position Prediction (PRPP) and Flow Contrastive Learning (FCL). After fine-tuning, MIETT achieves state-of-the-art (SOTA) results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors. Code is available at \url{https://github.com/Secilia-Cxy/MIETT}.

Paper Structure

This paper contains 39 sections, 11 equations, 5 figures, 5 tables, 4 algorithms.

Figures (5)

  • Figure 1: Encrypted traffic classification task description. Raw traffic is first divided into session flows, with each flow further segmented into a sequence of packets. A packet typically consists of a header and a payload. The task is to classify the type of a given flow.
  • Figure 2: Data preprocessing. The raw traffic (PCAP trace) is first split into session flows and then further divided into individual packets. To protect data privacy, each packet is anonymized by masking the source and destination IP addresses and port numbers (replacing them with 0). The packet is then converted to its hexadecimal form, which is tokenized using a bi-gram model.
  • Figure 3: Overall architecture of the Multi-Instance Encrypted Traffic Transformer (MIETT). After passing through the MIETT encoder, the flow representation is processed by $M$ Two-Level Attention (TLA) layers, each comprising a packet attention mechanism and a flow attention mechanism.
  • Figure 4: Overview of pre-training tasks. (a) Masked Flow Prediction (MFP) Task: The model is tasked with predicting the original content of masked tokens using the context provided by the unmasked tokens. (b) Packet Relative Position Prediction (PRPP) Task: The model's objective is to determine, for each pair of packets (i, j), whether packet i precedes packet j. (c) Flow Contrastive Learning (FCL) Task: The goal is to ensure that packets within the same flow (positive pairs) are more similar in the embedding space, while packets from different flows (negative pairs) are less similar.
  • Figure 5: Impact of the Number of Packets.