Table of Contents
Fetching ...

TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification

Qing He, Xiaowei Fu, Lei Zhang

Abstract

Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard modeling approaches. Most existing frameworks rely on static and homogeneous pipelines that apply uniform parameter sharing and static fusion strategies across all inputs. This one-size-fits-all static design is inherently flawed: by forcing structured headers and randomized payloads into a unified processing pipeline, it inevitably entangles the raw protocol signals with stochastic encryption noise, thereby degrading the fine-grained discriminative features. In this paper, we propose TrafficMoE, a framework that breaks through the bottleneck of static modeling by establishing a Disentangle-Filter-Aggregate (DFA) paradigm. Specifically, to resolve the structural between-components conflict, the architecture disentangles headers and payloads using dual-branch sparse Mixture-of-Experts (MoE), enabling modality-specific modeling. To mitigate the impact of stochastic noise, an uncertainty-aware filtering mechanism is introduced to quantify reliability and selectively suppress high-variance representations. Finally, to overcome the limitations of static fusion, a routing-guided strategy aggregates cross-modality features dynamically, that adaptively weighs contributions based on traffic context. With this DFA paradigm, TrafficMoE maximizes representational efficiency by focusing solely on the most discriminative traffic features. Extensive experiments on six datasets demonstrate TrafficMoE consistently outperforms state-of-the-art methods, validating the necessity of heterogeneity-aware modeling in encrypted traffic analysis. The source code is publicly available at https://github.com/Posuly/TrafficMoE_main.

TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification

Abstract

Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard modeling approaches. Most existing frameworks rely on static and homogeneous pipelines that apply uniform parameter sharing and static fusion strategies across all inputs. This one-size-fits-all static design is inherently flawed: by forcing structured headers and randomized payloads into a unified processing pipeline, it inevitably entangles the raw protocol signals with stochastic encryption noise, thereby degrading the fine-grained discriminative features. In this paper, we propose TrafficMoE, a framework that breaks through the bottleneck of static modeling by establishing a Disentangle-Filter-Aggregate (DFA) paradigm. Specifically, to resolve the structural between-components conflict, the architecture disentangles headers and payloads using dual-branch sparse Mixture-of-Experts (MoE), enabling modality-specific modeling. To mitigate the impact of stochastic noise, an uncertainty-aware filtering mechanism is introduced to quantify reliability and selectively suppress high-variance representations. Finally, to overcome the limitations of static fusion, a routing-guided strategy aggregates cross-modality features dynamically, that adaptively weighs contributions based on traffic context. With this DFA paradigm, TrafficMoE maximizes representational efficiency by focusing solely on the most discriminative traffic features. Extensive experiments on six datasets demonstrate TrafficMoE consistently outperforms state-of-the-art methods, validating the necessity of heterogeneity-aware modeling in encrypted traffic analysis. The source code is publicly available at https://github.com/Posuly/TrafficMoE_main.

Paper Structure

This paper contains 18 sections, 41 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison between existing modeling paradigms and the proposed TrafficMoE framework. Existing paradigm (a) typically process heterogeneous traffic components in a unified manner with static fusion strategies, whereas TrafficMoE (b) explicitly disentangles headers and payloads, incorporates uncertainty-aware filtering (UF), and performs conditional aggregation (CA) guided by MoE routing probabilities for adaptive and context-aware integration.
  • Figure 2: Overview of the TrafficMoE framework. The dual-branch MoE architecture explicitly models heterogeneous traffic components, i.e., the Header and Payload branches, which can capture intra-modality patterns, respectively. The uncertainty-aware filter suppresses unreliable tokens, i.e., noisy traffic components, and the conditional aggregation module fuses the final purified features across two modalities (header vs. payload) in a routing-guided manner for encrypted traffic classification.
  • Figure 3: End-to-end preprocessing pipeline for encrypted traffic. Raw traffic is first segmented into flows using canonical 5-tuple session identification. These flows are further decomposed into packet-level units through packets splitting. Each packet undergoes byte-level cropping and zero-padding to produce fixed-dimensional header and payload segments. The resulting sequence of fixed packets is then partitioned into non-overlapping strides, enabling consistent, length-normalized inputs for subsequent neural processing while preserving temporal ordering and structural heterogeneity.
  • Figure 4: The basic idea of UF, which quantifies the alignment uncertainty via Shannon entropy ($H$). Sharp distributions in green mean low entropy $H$ and identify reliable metadata to be retained (i.e., the filter $g \to 1$), while dispersed distributions in red mean high entropy $H$ and identify noisy components to be suppressed (i.e., the filter $g \to 0$).
  • Figure 5: Training pipeline of the proposed framework. In the pre-training stage, masked language modeling (MLM) is used to learn contextual representations from header and payload sequences without supervision. In the fine-tuning stage, the pretrained encoders are fine-tuned end-to-end on labeled traffic data through a standard cross-entropy classification objective.
  • ...and 3 more figures