Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success
Jan Luxemburk, Karel Hynek, Richard Plný, Tomáš Čejka
TL;DR
This work addresses encrypted traffic classification by proposing a universal embedding function that maps packet sequences to a discriminative 256-dimensional space, enabling simple k-NN classification and robust transfer learning. The embedding is pretrained on a challenging domain-recognition task using QUIC SNI domains and ArcFace-based objectives with Sub-center dynamic margins and semi-balanced sampling, producing embeddings that generalize across ten downstream TC tasks. The approach achieves state-of-the-art results on nine of ten tasks, with an average improvement of 6.4% over SOTA, and reveals intriguing insights such as strong performance from an input-space baseline in some settings. By releasing the pretrained model, codebase, and architecture, the work provides a practical, transferable framework for encrypted traffic analysis and potential extensions to other network-monitoring tasks.
Abstract
Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we adopt a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to seven established TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a challenge for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. A transfer method based on model fine-tuning surpassed SOTA performance on nine of ten downstream TC tasks, with an average improvement of 6.4%. Furthermore, a comparison with a baseline method using raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We released the model architecture, trained weights, and codebase for transfer learning experiments.
