A streaming algorithm and hardware accelerator for top-K flow detection in network traffic
Carolina Gallardo-Pavesi, Yaime Fernández, Javier E. Soto, Cecilia Hernández, Miguel Figueroa
TL;DR
This paper addresses real-time top-$K$ flow detection in high-speed networks under tight on-chip memory. It introduces a streaming algorithm that combines a modified TowerSketch with conservative updates and a fixed-size approximate priority queue to identify top-$K$ flows and estimate their frequencies in a single pass. On CAIDA traces, it achieves precision exceeding $0.94$ for $K\leq 32{,}768$ and average relative error below $1.96\%$, outperforming prior sketches and PQ structures. An FPGA accelerator implemented on a $XCU280$ runs at $392$ MHz, processes one packet per cycle, and sustains line rates above $200$ Gbps while using only a small fraction of device resources, enabling easy integration into data-plane hardware.
Abstract
Identifying the largest K flows in network traffic is an important task for applications such as flow scheduling and anomaly detection, which aim to improve network efficiency and security. However, accurately estimating flow frequencies is challenging due to the large number of flows and increasing network speeds. Hardware accelerators are often used in this endeavor due to their high computational power, but their limited amount of on-chip memory constrains their performance. Various sketch-based algorithms have been proposed to estimate properties of traffic such as frequency, with lower memory usage and theoretical bounds, but they often under perform with the skewed distribution of network traffic. In this work, we propose an algorithm for top-K identification using a modified TowerSketch and a priority queue array. Tested on real traffic traces, we identify the top-K flows, with K up to 32,768, with a precision of more than 0.94, and estimate their frequency with an average relative error under 1.96%. We designed and implemented an accelerator for this algorithm on an AMD VirtexU280 UltraScale+ FPGA, which processes one packet per cycle at392 MHz, reaching a minimum line rate of more than 200 Gbps.
