Table of Contents
Fetching ...

Mean Masked Autoencoder with Flow-Mixing for Encrypted Traffic Classification

Xiao Liu, Xiaowei Fu, Fuxiang Huang, Lei Zhang

Abstract

Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building encrypted traffic pre-training model. MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace traditional random masking mechanism. By constructing challenging cross-flow mixed samples with interferences, it compels the model to learn discriminative representations from distorted tokens. Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism that leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density. Numerous experiments on a number of datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The code is available at https://github.com/lx6c78/MMAE

Mean Masked Autoencoder with Flow-Mixing for Encrypted Traffic Classification

Abstract

Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building encrypted traffic pre-training model. MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace traditional random masking mechanism. By constructing challenging cross-flow mixed samples with interferences, it compels the model to learn discriminative representations from distorted tokens. Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism that leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density. Numerous experiments on a number of datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The code is available at https://github.com/lx6c78/MMAE

Paper Structure

This paper contains 28 sections, 32 equations, 8 figures, 6 tables, 2 algorithms.

Figures (8)

  • Figure 1: Comparison of reconstruction loss during pre-training between MAE and its variant with Self-Distillation strategy (i.e., MAE+SD). By introducing flow-level semantics via SD, the loss is consistently lower.
  • Figure 2: Comparison of pre-training paradigms. (a) Standard MAE with a random masking strategy in an isolated single flow. (b) MMAE (ours): A novel teacher-student pre-training architecture with a flow-mixing strategy, where the teacher is a copy of student and interacted via self-distillation strategy under the unmasked flow-level semantic supervision from the teacher.
  • Figure 3: Comparison of hierarchical semantic extraction capabilities between MAE and the proposed MMAE. MAE relies solely on a single flow and random masking for reconstruction. Our MMAE incorporates cross-flow mixing and packet-level side-channel statistical features. Within the self-distillation teacher-student architecture, flow-level reference features are used as supervision for reconstruction and mask prediction tasks.
  • Figure 4: Flowchar of the proposed MMAE, which mainly includes FlowMix and teacher-student based twin MAEs. MMAE incorporates a traffic pre-processing unit, a statistics-based flow matcher and a cross-flow mixing unit (i.e., FlowMix). Within the self-distillation teacher-student twin MAE architecture, flow-level reference features are exploited as supervision for reconstruction and mask prediction tasks.
  • Figure 5: Architecture of the Packet-importance aware Mask Predictor (PMP). PMP leverages packet-level side-channel priors to generate a low-rank attention bias via a token-aware gating mechanism. This bias dynamically modulates self-attention to identify and mask challenging, semantically dense regions.
  • ...and 3 more figures