Table of Contents
Fetching ...

Memory-efficient Sketch Acceleration for Handling Large Network Flows on FPGAs

Zhaoyang Han, Yicheng Qian, Michael Zink, Miriam Leeser

TL;DR

This work tackles memory bottlenecks in Count-Min Sketch implementations on FPGA-based NICs for large-volume network traffic. It introduces HBRICK, a hardware-friendly, variable-width counter design that extends the CM sketch by supporting larger hash tables with reduced overestimation, implemented via a P4 front end and HLS compute kernels and validated on an AMD Alveo U280 at line-rate on a 100 Gbps link. Key contributions include the hardware-friendly multi-level counter design, parallel indexing with a fixed-latency update path, data packing for optional levels, and an overflow store with an associative memory, all integrated into an end-to-end P4+HLS workflow on the Open Cloud Testbed. Experimental results show improved BRAM efficiency, competitive throughput (real-time ~92 Gbps; theoretical ~195 Gbps for 64-byte packets), and favorable accuracy on skewed traffic, establishing a practical path for scalable in-network analytics on FPGA NICs.

Abstract

Sketch-based algorithms for network traffic monitoring have drawn increasing interest in recent years due to their sub-linear memory efficiency and high accuracy. As the volume of network traffic grows, software-based sketch implementations cannot match the throughput of the incoming network flows. FPGA-based hardware sketch has shown better performance compared to software running on a CPU when handling these packets. Among the various sketch algorithms, Count-min sketch is one of the most popular and efficient. However, due to the limited amount of on-chip memory, the FPGA-based count-Min sketch accelerator suffers from performance drops as network traffic grows. In this work, we propose a hardware-friendly architecture with a variable width memory counter for count-min sketch. Our architecture provides a more compact design to store the sketch data structure effectively, allowing us to support larger hash tables and reduce overestimation errors. The design makes use of a P4-based programmable data plane and the AMD OpenNIC shell. The design is implemented and verified on the Open Cloud Testbed running on AMD Alveo U280s and can keep up with the 100 Gbit link speed.

Memory-efficient Sketch Acceleration for Handling Large Network Flows on FPGAs

TL;DR

This work tackles memory bottlenecks in Count-Min Sketch implementations on FPGA-based NICs for large-volume network traffic. It introduces HBRICK, a hardware-friendly, variable-width counter design that extends the CM sketch by supporting larger hash tables with reduced overestimation, implemented via a P4 front end and HLS compute kernels and validated on an AMD Alveo U280 at line-rate on a 100 Gbps link. Key contributions include the hardware-friendly multi-level counter design, parallel indexing with a fixed-latency update path, data packing for optional levels, and an overflow store with an associative memory, all integrated into an end-to-end P4+HLS workflow on the Open Cloud Testbed. Experimental results show improved BRAM efficiency, competitive throughput (real-time ~92 Gbps; theoretical ~195 Gbps for 64-byte packets), and favorable accuracy on skewed traffic, establishing a practical path for scalable in-network analytics on FPGA NICs.

Abstract

Sketch-based algorithms for network traffic monitoring have drawn increasing interest in recent years due to their sub-linear memory efficiency and high accuracy. As the volume of network traffic grows, software-based sketch implementations cannot match the throughput of the incoming network flows. FPGA-based hardware sketch has shown better performance compared to software running on a CPU when handling these packets. Among the various sketch algorithms, Count-min sketch is one of the most popular and efficient. However, due to the limited amount of on-chip memory, the FPGA-based count-Min sketch accelerator suffers from performance drops as network traffic grows. In this work, we propose a hardware-friendly architecture with a variable width memory counter for count-min sketch. Our architecture provides a more compact design to store the sketch data structure effectively, allowing us to support larger hash tables and reduce overestimation errors. The design makes use of a P4-based programmable data plane and the AMD OpenNIC shell. The design is implemented and verified on the Open Cloud Testbed running on AMD Alveo U280s and can keep up with the 100 Gbit link speed.

Paper Structure

This paper contains 17 sections, 1 equation, 8 figures, 5 tables, 3 algorithms.

Figures (8)

  • Figure 1: Minimum Counter Bit Width Requirement Distribution of 588K TCP/UDP flows of real network traces. The largest packet flow requires 29-bit to store its total size while most flows only require 14-bit counters.
  • Figure 2: Bucket design of BRICK Architecture and Bucket Migration. The red parts indicate the bucket overflow migration when there is not enough $A_3$ sub-counter bits.
  • Figure 3: HBRICK's optimized bucket design
  • Figure 4: HBRICK Associative Memory Design. (a) demonstrates a single mapping between the 9-bit key and 72 entries. (b) shows an extension of the associative memory with a larger key width and capacity.
  • Figure 5: Overall HBRICK architecture and data paths. The optimized bucket design includes three stages: pre-processing, indexing, and value fetching.
  • ...and 3 more figures