Table of Contents
Fetching ...

A streaming algorithm and hardware accelerator for top-K flow detection in network traffic

Carolina Gallardo-Pavesi, Yaime Fernández, Javier E. Soto, Cecilia Hernández, Miguel Figueroa

TL;DR

This paper addresses real-time top-$K$ flow detection in high-speed networks under tight on-chip memory. It introduces a streaming algorithm that combines a modified TowerSketch with conservative updates and a fixed-size approximate priority queue to identify top-$K$ flows and estimate their frequencies in a single pass. On CAIDA traces, it achieves precision exceeding $0.94$ for $K\leq 32{,}768$ and average relative error below $1.96\%$, outperforming prior sketches and PQ structures. An FPGA accelerator implemented on a $XCU280$ runs at $392$ MHz, processes one packet per cycle, and sustains line rates above $200$ Gbps while using only a small fraction of device resources, enabling easy integration into data-plane hardware.

Abstract

Identifying the largest K flows in network traffic is an important task for applications such as flow scheduling and anomaly detection, which aim to improve network efficiency and security. However, accurately estimating flow frequencies is challenging due to the large number of flows and increasing network speeds. Hardware accelerators are often used in this endeavor due to their high computational power, but their limited amount of on-chip memory constrains their performance. Various sketch-based algorithms have been proposed to estimate properties of traffic such as frequency, with lower memory usage and theoretical bounds, but they often under perform with the skewed distribution of network traffic. In this work, we propose an algorithm for top-K identification using a modified TowerSketch and a priority queue array. Tested on real traffic traces, we identify the top-K flows, with K up to 32,768, with a precision of more than 0.94, and estimate their frequency with an average relative error under 1.96%. We designed and implemented an accelerator for this algorithm on an AMD VirtexU280 UltraScale+ FPGA, which processes one packet per cycle at392 MHz, reaching a minimum line rate of more than 200 Gbps.

A streaming algorithm and hardware accelerator for top-K flow detection in network traffic

TL;DR

This paper addresses real-time top- flow detection in high-speed networks under tight on-chip memory. It introduces a streaming algorithm that combines a modified TowerSketch with conservative updates and a fixed-size approximate priority queue to identify top- flows and estimate their frequencies in a single pass. On CAIDA traces, it achieves precision exceeding for and average relative error below , outperforming prior sketches and PQ structures. An FPGA accelerator implemented on a runs at MHz, processes one packet per cycle, and sustains line rates above Gbps while using only a small fraction of device resources, enabling easy integration into data-plane hardware.

Abstract

Identifying the largest K flows in network traffic is an important task for applications such as flow scheduling and anomaly detection, which aim to improve network efficiency and security. However, accurately estimating flow frequencies is challenging due to the large number of flows and increasing network speeds. Hardware accelerators are often used in this endeavor due to their high computational power, but their limited amount of on-chip memory constrains their performance. Various sketch-based algorithms have been proposed to estimate properties of traffic such as frequency, with lower memory usage and theoretical bounds, but they often under perform with the skewed distribution of network traffic. In this work, we propose an algorithm for top-K identification using a modified TowerSketch and a priority queue array. Tested on real traffic traces, we identify the top-K flows, with K up to 32,768, with a precision of more than 0.94, and estimate their frequency with an average relative error under 1.96%. We designed and implemented an accelerator for this algorithm on an AMD VirtexU280 UltraScale+ FPGA, which processes one packet per cycle at392 MHz, reaching a minimum line rate of more than 200 Gbps.

Paper Structure

This paper contains 13 sections, 5 figures, 6 tables, 3 algorithms.

Figures (5)

  • Figure 1: TowerSketch structure. The sketch uses six rows: three with 8-bit counters, two with 16-bit counters and one with 32-bit counters. Each row uses the same amount of memory.
  • Figure 2: General architecture of the accelerator. The flow identifier of each packet is inserted into the TowerSketch, which estimates its frequency. The estimation is inserted into the PQA with a flow tag.
  • Figure 3: TowerSketch architecture. It reads the six counter values from memory blocks and extends them to 32 bits. It then compares them to determine which ones it must increment, and outputs a frequency estimation.
  • Figure 4: Read circuit for a row with 8-bit buckets. All memory blocks are read simultaneously and the 64-bit output is selected using a multiplexer. The counter value is extracted by shifting the word and keeping the least significant bits.
  • Figure 5: PQA update circuit. It uses a multiplexer per element to maintain the elements in a queue sorted.