Table of Contents
Fetching ...

MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification

Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

TL;DR

This work presents MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification that directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input.

Abstract

We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.

MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification

TL;DR

This work presents MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification that directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input.

Abstract

We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.

Paper Structure

This paper contains 11 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the MERLOT architecture.
  • Figure 2: Example of contextual feature embedding.
  • Figure 3: T-SNE visualizations of input embeddings and classification outputs. Left: True labels. Right: Predicted labels by the gating function and expert models in MoE.
  • Figure 4: Performance variations with respect to different number of layers in the student model.

Theorems & Definitions (4)

  • Remark
  • Remark
  • Remark
  • Remark