Table of Contents
Fetching ...

Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

Jan Luxemburk, Karel Hynek, Richard Plný, Tomáš Čejka

TL;DR

This work addresses encrypted traffic classification by proposing a universal embedding function that maps packet sequences to a discriminative 256-dimensional space, enabling simple k-NN classification and robust transfer learning. The embedding is pretrained on a challenging domain-recognition task using QUIC SNI domains and ArcFace-based objectives with Sub-center dynamic margins and semi-balanced sampling, producing embeddings that generalize across ten downstream TC tasks. The approach achieves state-of-the-art results on nine of ten tasks, with an average improvement of 6.4% over SOTA, and reveals intriguing insights such as strong performance from an input-space baseline in some settings. By releasing the pretrained model, codebase, and architecture, the work provides a practical, transferable framework for encrypted traffic analysis and potential extensions to other network-monitoring tasks.

Abstract

Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we adopt a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to seven established TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a challenge for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. A transfer method based on model fine-tuning surpassed SOTA performance on nine of ten downstream TC tasks, with an average improvement of 6.4%. Furthermore, a comparison with a baseline method using raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We released the model architecture, trained weights, and codebase for transfer learning experiments.

Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

TL;DR

This work addresses encrypted traffic classification by proposing a universal embedding function that maps packet sequences to a discriminative 256-dimensional space, enabling simple k-NN classification and robust transfer learning. The embedding is pretrained on a challenging domain-recognition task using QUIC SNI domains and ArcFace-based objectives with Sub-center dynamic margins and semi-balanced sampling, producing embeddings that generalize across ten downstream TC tasks. The approach achieves state-of-the-art results on nine of ten tasks, with an average improvement of 6.4% over SOTA, and reveals intriguing insights such as strong performance from an input-space baseline in some settings. By releasing the pretrained model, codebase, and architecture, the work provides a practical, transferable framework for encrypted traffic analysis and potential extensions to other network-monitoring tasks.

Abstract

Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we adopt a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to seven established TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a challenge for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. A transfer method based on model fine-tuning surpassed SOTA performance on nine of ten downstream TC tasks, with an average improvement of 6.4%. Furthermore, a comparison with a baseline method using raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We released the model architecture, trained weights, and codebase for transfer learning experiments.

Paper Structure

This paper contains 53 sections, 1 equation, 8 figures, 8 tables.

Figures (8)

  • Figure 1: A complete processing pipeline starting with network flows as input. The embedding function $\Phi$, which is implemented as a neural network, maps flows into a 256-dimensional vector space. The visualized ArcFace head is used during training to optimize the neural network, which is composed of a backbone model and a compression neck.
  • Figure 2: An overview of the experimental setup, highlighting the purpose of the disjoint domain split along with the database and query preparation for validation and testing.
  • Figure 3: The architecture of the 30pktTCNET backbone model consists of four main components: a stem, convolutional blocks, global pooling, and feature refinement. The main processing is done in the convolutional blocks, which include four Bottleneck Residual Blocks described in detail in Figure \ref{['fig:bottleneck-block']}. Each block has a different configuration of the following parameters: the number of output channels (e.g., 256c), kernel size (e.g., 7k), and dropout rate.
  • Figure 4: The diagram of Bottleneck Residual Block. $k$: the kernel size of the main convolution, $C_{out}$: the number of output channels, $r$: the dropout rate. The number of channels of the main convolution $C_{mid}$ is set to $\frac{C_{out}}{4}$. All convolutions use a stride of 1 and automatic padding to ensure that the spatial dimension is kept intact. Convolutions do not use biases.
  • Figure 5: The impact of the $\lambda_{sampler}$ balancing parameter of training sampler. Both vertical axes of top1-acc and recall have a range of 2% to make the shapes of the lines comparable.
  • ...and 3 more figures