AutoFlow: An Autoencoder-based Approach for IP Flow Record Compression with Minimal Impact on Traffic Classification
Adrian Pekar
TL;DR
The paper addresses the challenge of storing and analyzing massive IP flow records by proposing an autoencoder-based compression that preserves downstream utility. It learns a 16-dimensional latent representation from 21 flow features and enables direct analysis on compressed data using a Random Forest classifier, achieving a practical compression ratio of $1.312\times$ with $99.27\%$ accuracy vs $99.77\%$ on uncompressed data. On a real-world university network dataset, this method demonstrates that substantial data reduction can be achieved with only a small degradation in classification performance, enabling scalable, real-time network monitoring. The work highlights trade-offs between compression efficiency and analytical fidelity, and points to future avenues in architecture optimization and encrypted-traffic handling.
Abstract
Network monitoring generates massive volumes of IP flow records, posing significant challenges for storage and analysis. This paper presents a novel deep learning-based approach to compressing these records using autoencoders, enabling direct analysis of compressed data without requiring decompression. Unlike traditional compression methods, our approach reduces data volume while retaining the utility of compressed data for downstream analysis tasks, including distinguishing modern application protocols and encrypted traffic from popular services. Through extensive experiments on a real-world network traffic dataset, we demonstrate that our autoencoder-based compression achieves a 1.313x reduction in data size while maintaining 99.27% accuracy in a multi-class traffic classification task, compared to 99.77% accuracy with uncompressed data. This marginal decrease in performance is offset by substantial gains in storage and processing efficiency. The implications of this work extend to more efficient network monitoring and scalable, real-time network management solutions.
