Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO
Shilo Daum, Tal Shapira, Anat Bremler-Barr, David Hay
TL;DR
This work tackles encrypted traffic classification by addressing memory and latency bottlenecks through two core techniques. Hyperparameter Optimization of binnings (HO) generates non-uniform, data- and model-aware bin boundaries to compress flow representations without sacrificing accuracy. Early Classification (EC) introduces a cascade of classifiers with multiple exit times and a confidence threshold, drastically reducing average collection time while maintaining accuracy. The combined approach, ECHO, applies HO to packet-size bins and, in some variants, arrival-time bins, enabling fast, accurate classifications across multiple datasets. The results show meaningful improvements in efficiency and throughput, with practical implications for large-scale network monitoring and security deployments. The work also emphasizes interpretability of the learned bin boundaries and provides reproducibility plans for the community.
Abstract
With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.
