Table of Contents
Fetching ...

TAO-Net: Two-stage Adaptive OOD Classification Network for Fine-grained Encrypted Traffic Classification

Zihao Wang, Wei Peng, Junming Zhang, Jian Li, Wenxin Fang

TL;DR

TAO-Net introduces a two-stage adaptive framework for encrypted traffic classification that handles both In-Distribution (ID) and Out-of-Distribution (OOD) traffic. The first stage uses a hybrid detector combining inter-layer transformation smoothness and PCA-based analysis to separate ID from OOD data; the second stage applies a Transformer classifier to ID traffic and leverages Large Language Models (LLMs) with a Semantic-enhanced Prompt Strategy (SPS) to generate fine-grained OOD labels. Across CHNAPP, ISCXVPN, and ISCXTor datasets, TAO-Net achieves macro-precision of 96.81–97.70% and macro-F1 of 96.77–97.68%, substantially outperforming baselines that struggle with OOD cases. The work demonstrates that transforming traffic classification into a generation task via controlled prompting and stage-specific processing yields robust, scalable identification of emerging applications, with strong implications for network security monitoring and threat detection.

Abstract

Encrypted traffic classification aims to identify applications or services by analyzing network traffic data. One of the critical challenges is the continuous emergence of new applications, which generates Out-of-Distribution (OOD) traffic patterns that deviate from known categories and are not well represented by predefined models. Current approaches rely on predefined categories, which limits their effectiveness in handling unknown traffic types. Although some methods mitigate this limitation by simply classifying unknown traffic into a single "Other" category, they fail to make a fine-grained classification. In this paper, we propose a Two-stage Adaptive OOD classification Network (TAO-Net) that achieves accurate classification for both In-Distribution (ID) and OOD encrypted traffic. The method incorporates an innovative two-stage design: the first stage employs a hybrid OOD detection mechanism that integrates transformer-based inter-layer transformation smoothness and feature analysis to effectively distinguish between ID and OOD traffic, while the second stage leverages large language models with a novel semantic-enhanced prompt strategy to transform OOD traffic classification into a generation task, enabling flexible fine-grained classification without relying on predefined labels. Experiments on three datasets demonstrate that TAO-Net achieves 96.81-97.70% macro-precision and 96.77-97.68% macro-F1, outperforming previous methods that only reach 44.73-86.30% macro-precision, particularly in identifying emerging network applications.

TAO-Net: Two-stage Adaptive OOD Classification Network for Fine-grained Encrypted Traffic Classification

TL;DR

TAO-Net introduces a two-stage adaptive framework for encrypted traffic classification that handles both In-Distribution (ID) and Out-of-Distribution (OOD) traffic. The first stage uses a hybrid detector combining inter-layer transformation smoothness and PCA-based analysis to separate ID from OOD data; the second stage applies a Transformer classifier to ID traffic and leverages Large Language Models (LLMs) with a Semantic-enhanced Prompt Strategy (SPS) to generate fine-grained OOD labels. Across CHNAPP, ISCXVPN, and ISCXTor datasets, TAO-Net achieves macro-precision of 96.81–97.70% and macro-F1 of 96.77–97.68%, substantially outperforming baselines that struggle with OOD cases. The work demonstrates that transforming traffic classification into a generation task via controlled prompting and stage-specific processing yields robust, scalable identification of emerging applications, with strong implications for network security monitoring and threat detection.

Abstract

Encrypted traffic classification aims to identify applications or services by analyzing network traffic data. One of the critical challenges is the continuous emergence of new applications, which generates Out-of-Distribution (OOD) traffic patterns that deviate from known categories and are not well represented by predefined models. Current approaches rely on predefined categories, which limits their effectiveness in handling unknown traffic types. Although some methods mitigate this limitation by simply classifying unknown traffic into a single "Other" category, they fail to make a fine-grained classification. In this paper, we propose a Two-stage Adaptive OOD classification Network (TAO-Net) that achieves accurate classification for both In-Distribution (ID) and OOD encrypted traffic. The method incorporates an innovative two-stage design: the first stage employs a hybrid OOD detection mechanism that integrates transformer-based inter-layer transformation smoothness and feature analysis to effectively distinguish between ID and OOD traffic, while the second stage leverages large language models with a novel semantic-enhanced prompt strategy to transform OOD traffic classification into a generation task, enabling flexible fine-grained classification without relying on predefined labels. Experiments on three datasets demonstrate that TAO-Net achieves 96.81-97.70% macro-precision and 96.77-97.68% macro-F1, outperforming previous methods that only reach 44.73-86.30% macro-precision, particularly in identifying emerging network applications.

Paper Structure

This paper contains 31 sections, 10 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between previous ID traffic classification models (left) and our proposed TAO-Net (right). Previous models classify OOD traffic into a single "Other" category, while TAO-Net can identify specific applications in OOD traffic by leveraging LLMs' generation capabilities through a novel Semantic-enhanced Prompt Strategy (SPS), offering improved security monitoring.
  • Figure 2: Framework of TAO-Net. Stage one performs OOD detection through a hybrid mechanism that integrates inter-layer transformation smoothness and feature analysis. Stage two conducts adaptive classification: ID traffic follows a transformer-based classification path for known categories, while OOD traffic is processed through an LLM with SPS, enabling fine-grained identification of emerging applications.
  • Figure 3: Performance comparison of different models across four metrics (M-Prec, Macro F1, Micro F1, and Recall) on three datasets. TAO-Net consistently achieves superior results across all metrics, particularly in handling OOD traffic. The performance gap is most pronounced when classifying VPN-encrypted traffic (ISCXVPN dataset).
  • Figure 4: Confusion matrices comparing model performance across CHNAPP, ISCXVPN, and ISCXTor datasets. Darker colors indicate higher prediction accuracy. The red and green boxes highlight numerous non-zero elements in GPT-4o's upper/lower triangular regions due to lack of OOD detection (Stage 1), while TAO-Net's two-stage design maintains predominantly zero values in these regions.
  • Figure 5: Performance comparison of SPS modes (Strict, Complete, Extended) with PacRep-baseline and GPT-4o-baseline across three datasets. For each dataset we report three metrics (M-Prec, Macro F1, Micro F1). Strict mode consistently outperforms both baselines and the other modes, demonstrating the effectiveness of constrained generation space in LLM-based traffic classification.