M3S-UPD: Efficient Multi-Stage Self-Supervised Learning for Fine-Grained Encrypted Traffic Classification with Unknown Pattern Discovery
Yali Yuan, Yu Huang, Xingjian Zeng, Hantao Mei, Guang Cheng
TL;DR
This work tackles encrypted traffic classification under data scarcity and evolving unknown patterns by introducing M3S-UPD, a four-stage self-supervised framework that unifies learning for known classes with discovery of unknown patterns. It leverages probabilistic embeddings, density-based clustering via DBSCAN, and a spatial-distribution alignment strategy to assign auxiliary labels to unlabeled data while isolating unknowns, followed by a reliability-driven update using high-confidence pseudo-labels. The method demonstrates competitive performance against state-of-the-art baselines on public Tor datasets, with strong capability to detect unknown traffic without synthetic samples and to adapt through expert-labeled unknowns. Overall, M3S-UPD offers a practical path to continuous, open-world encrypted traffic analysis in real networks, addressing data scarcity and concept drift while maintaining stable updates.
Abstract
The growing complexity of encrypted network traffic presents dual challenges for modern network management: accurate multiclass classification of known applications and reliable detection of unknown traffic patterns. Although deep learning models show promise in controlled environments, their real-world deployment is hindered by data scarcity, concept drift, and operational constraints. This paper proposes M3S-UPD, a novel Multi-Stage Self-Supervised Unknown-aware Packet Detection framework that synergistically integrates semi-supervised learning with representation analysis. Our approach eliminates artificial segregation between classification and detection tasks through a four-phase iterative process: 1) probabilistic embedding generation, 2) clustering-based structure discovery, 3) distribution-aligned outlier identification, and 4) confidence-aware model updating. Key innovations include a self-supervised unknown detection mechanism that requires neither synthetic samples nor prior knowledge, and a continuous learning architecture that is resistant to performance degradation. Experimental results show that M3S-UPD not only outperforms existing methods on the few-shot encrypted traffic classification task, but also simultaneously achieves competitive performance on the zero-shot unknown traffic discovery task.
