Table of Contents
Fetching ...

Less is More: Simplifying Network Traffic Classification Leveraging RFCs

Nimesha Wickramasinghe, Arash Shaghaghi, Elena Ferrari, Sanjay Jha

TL;DR

This work tackles encrypted network traffic classification by challenging the reliance on payload-based or transformed representations that conflict with RFC specifications. It introduces NetMatrix, a compact tabular representation using three RFC-aligned features (IP Total Length, TTL, and Inter-Arrival Time) drawn from five encrypted packets, avoiding payloads and noisy headers. Paired with a vanilla XGBoost classifier, the LiM pipeline demonstrates that a simple, RFC-consistent approach can achieve competitive accuracy (0.942) to state-of-the-art methods like YaTC (0.963) while dramatically reducing resource usage: latency ~0.0005 s/sample, memory ~196 MiB, energy ~0 W, and throughput ~86k samples/s. These results underscore the practicality of RFC-aligned, minimal representations for real-time, resource-constrained encrypted traffic classification and point to scalable avenues for broader datasets and classifier comparisons in future work.

Abstract

The rapid growth of encryption has significantly enhanced privacy and security while posing challenges for network traffic classification. Recent approaches address these challenges by transforming network traffic into text or image formats to leverage deep-learning models originally designed for natural language processing, and computer vision. However, these transformations often contradict network protocol specifications, introduce noisy features, and result in resource-intensive processes. To overcome these limitations, we propose NetMatrix, a minimalistic tabular representation of network traffic that eliminates noisy attributes and focuses on meaningful features leveraging RFCs (Request for Comments) definitions. By combining NetMatrix with a vanilla XGBoost classifier, we implement a lightweight approach, LiM ("Less is More") that achieves classification performance on par with state-of-the-art methods such as ET-BERT and YaTC. Compared to selected baselines, experimental evaluations demonstrate that LiM improves resource consumption by orders of magnitude. Overall, this study underscores the effectiveness of simplicity in traffic representation and machine learning model selection, paving the way towards resource-efficient network traffic classification.

Less is More: Simplifying Network Traffic Classification Leveraging RFCs

TL;DR

This work tackles encrypted network traffic classification by challenging the reliance on payload-based or transformed representations that conflict with RFC specifications. It introduces NetMatrix, a compact tabular representation using three RFC-aligned features (IP Total Length, TTL, and Inter-Arrival Time) drawn from five encrypted packets, avoiding payloads and noisy headers. Paired with a vanilla XGBoost classifier, the LiM pipeline demonstrates that a simple, RFC-consistent approach can achieve competitive accuracy (0.942) to state-of-the-art methods like YaTC (0.963) while dramatically reducing resource usage: latency ~0.0005 s/sample, memory ~196 MiB, energy ~0 W, and throughput ~86k samples/s. These results underscore the practicality of RFC-aligned, minimal representations for real-time, resource-constrained encrypted traffic classification and point to scalable avenues for broader datasets and classifier comparisons in future work.

Abstract

The rapid growth of encryption has significantly enhanced privacy and security while posing challenges for network traffic classification. Recent approaches address these challenges by transforming network traffic into text or image formats to leverage deep-learning models originally designed for natural language processing, and computer vision. However, these transformations often contradict network protocol specifications, introduce noisy features, and result in resource-intensive processes. To overcome these limitations, we propose NetMatrix, a minimalistic tabular representation of network traffic that eliminates noisy attributes and focuses on meaningful features leveraging RFCs (Request for Comments) definitions. By combining NetMatrix with a vanilla XGBoost classifier, we implement a lightweight approach, LiM ("Less is More") that achieves classification performance on par with state-of-the-art methods such as ET-BERT and YaTC. Compared to selected baselines, experimental evaluations demonstrate that LiM improves resource consumption by orders of magnitude. Overall, this study underscores the effectiveness of simplicity in traffic representation and machine learning model selection, paving the way towards resource-efficient network traffic classification.

Paper Structure

This paper contains 14 sections, 2 tables.