Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

Jan Luxemburk; Karel Hynek; Richard Plný; Tomáš Čejka

Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

Jan Luxemburk, Karel Hynek, Richard Plný, Tomáš Čejka

TL;DR

This work addresses encrypted traffic classification by proposing a universal embedding function that maps packet sequences to a discriminative 256-dimensional space, enabling simple k-NN classification and robust transfer learning. The embedding is pretrained on a challenging domain-recognition task using QUIC SNI domains and ArcFace-based objectives with Sub-center dynamic margins and semi-balanced sampling, producing embeddings that generalize across ten downstream TC tasks. The approach achieves state-of-the-art results on nine of ten tasks, with an average improvement of 6.4% over SOTA, and reveals intriguing insights such as strong performance from an input-space baseline in some settings. By releasing the pretrained model, codebase, and architecture, the work provides a practical, transferable framework for encrypted traffic analysis and potential extensions to other network-monitoring tasks.

Abstract

Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we adopt a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to seven established TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a challenge for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. A transfer method based on model fine-tuning surpassed SOTA performance on nine of ten downstream TC tasks, with an average improvement of 6.4%. Furthermore, a comparison with a baseline method using raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We released the model architecture, trained weights, and codebase for transfer learning experiments.

Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

TL;DR

Abstract

Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)