Lossy Compression of Network Feature Data: When Less Is Enough

Fabio Palmese; Gabriele Merlach; Damiano Ravalico; Martino Trevisan; Alessandro E. C. Redondi

Lossy Compression of Network Feature Data: When Less Is Enough

Fabio Palmese, Gabriele Merlach, Damiano Ravalico, Martino Trevisan, Alessandro E. C. Redondi

TL;DR

It is shown that simple, semantics-preserving compression techniques expose stable operating regions that balance storage efficiency and task performance and highlight compression as a first-class design dimension in scalable network monitoring systems.

Abstract

Network traffic analysis increasingly relies on feature-based representations to support monitoring and security in the presence of pervasive encryption. Although features are more compact than raw packet traces, their storage has become a scalability bottleneck from large-scale core networks to resource-constrained Internet of Things (IoT) environments. This article investigates task-aware lossy compression strategies that reduce the storage footprint of traffic features while preserving analytics accuracy. Using website classification in core networks and device identification in IoT environments as representative use cases, we show that simple, semantics-preserving compression techniques expose stable operating regions that balance storage efficiency and task performance. These results highlight compression as a first-class design dimension in scalable network monitoring systems.

Lossy Compression of Network Feature Data: When Less Is Enough

TL;DR

Abstract

Paper Structure (21 sections, 3 figures, 1 table)

This paper contains 21 sections, 3 figures, 1 table.

Introduction
Background and State of the Art
Packet-Level Traffic Compression
Flow-Level Monitoring and Feature-Based Representations
Dimensionality Reduction and Learning-Based Approaches
IoT Traffic Monitoring and Forensic Pipelines
Open Challenges
Core Network Traffic Analysis
Use Case: Domain Classification
Storage Footprint and Compression Strategies
Accuracy--Storage Tradeoffs
Implications for Core Network Monitoring
IoT Traffic Analysis
Use Case: IoT Device Identification
Storage Footprint and Feature Selection
...and 6 more sections

Figures (3)

Figure 1: Conventional network traffic analysis pipelines typically apply compression only to raw packet traces, while extracted network feature data are stored at full numerical precision (top). This work examines an alternative design in which compression---potentially including lossy steps---is applied directly to network feature data, reducing storage requirements while preserving the utility of downstream analytics (bottom).
Figure 2: Accuracy--storage tradeoff for core network domain classification. Feature-wise scalar quantization achieves substantial additional storage reduction with limited accuracy loss, while PCA-based compression leads to inferior operating points. The shaded area indicates a practical operating region ($4\text{--}6\times$ storage reduction). From left to right, the markers refer to quantizing each feature with 32, 16, 8, 4 and 2 bits.
Figure 3: Accuracy--storage tradeoff for IoT device identification. Feature selection combined with scalar quantization enables near-maximal accuracy at very low storage rates per device; the shaded area highlights a practical operating region suitable for long-term retention on resource-constrained IoT gateways. From left to right, the markers refer to quantizing each feature with 2,4,8,16 and 32 bits.

Lossy Compression of Network Feature Data: When Less Is Enough

TL;DR

Abstract

Lossy Compression of Network Feature Data: When Less Is Enough

Authors

TL;DR

Abstract

Table of Contents

Figures (3)