Table of Contents
Fetching ...

FENIX: Enabling In-Network DNN Inference with FPGA-Enhanced Programmable Switches

Xiangyu Gao, Tong Li, Yinchao Zhang, Ziqiang Wang, Xiangsheng Zeng, Su Yao, Ke Xu

TL;DR

FENIX tackles the challenge of achieving low latency, high throughput, and high accuracy for in-network ML by partitioning work between a switch-based Data Engine and an FPGA-based Model Engine. It introduces a Data Engine consisting of a Flow Tracker, a probabilistic Rate Limiter with a token-bucket mechanism, and a Buffer Manager to deliver feature vectors to the FPGA, while the Model Engine executes DNN inference on specialized hardware using a Vector I/O Processor and a DNN Inference Module with INT8 quantization. A carefully designed interfacing strategy bridges the throughput gap between switch ASICs and FPGA, enabling per-flow inference and high overall throughput with minimal overhead. Experimental results on VPN-encrypted traffic and malware datasets show microsecond-level inference latency, multi-Tbps throughput, and classification accuracy exceeding state-of-the-art baselines, demonstrating the viability of FPGA-enhanced programmable switches for real-world, high-speed network analytics.

Abstract

Machine learning (ML) is increasingly used in network data planes for advanced traffic analysis, but existing solutions (such as FlowLens, N3IC, BoS) still struggle to simultaneously achieve low latency, high throughput, and high accuracy. To address these challenges, we present FENIX, a hybrid in-network ML system that performs feature extraction on programmable switch ASICs and deep neural network inference on FPGAs. FENIX introduces a Data Engine that leverages a probabilistic token bucket algorithm to control the sending rate of feature streams, effectively addressing the throughput gap between programmable switch ASICs and FPGAs. In addition, FENIX designs a Model Engine to enable high-accuracy deep neural network inference in the network, overcoming the difficulty of deploying complex models on resource-constrained switch chips. We implement FENIX on a programmable switch platform that integrates a Tofino ASIC and a ZU19EG FPGA directly, and evaluate it on real-world network traffic datasets. Our results show that FENIX achieves microsecond-level inference latency and multi-terabit throughput with low hardware overhead, and delivers over 90% accuracy on mainstream network traffic classification tasks, outperforming the state of the art.

FENIX: Enabling In-Network DNN Inference with FPGA-Enhanced Programmable Switches

TL;DR

FENIX tackles the challenge of achieving low latency, high throughput, and high accuracy for in-network ML by partitioning work between a switch-based Data Engine and an FPGA-based Model Engine. It introduces a Data Engine consisting of a Flow Tracker, a probabilistic Rate Limiter with a token-bucket mechanism, and a Buffer Manager to deliver feature vectors to the FPGA, while the Model Engine executes DNN inference on specialized hardware using a Vector I/O Processor and a DNN Inference Module with INT8 quantization. A carefully designed interfacing strategy bridges the throughput gap between switch ASICs and FPGA, enabling per-flow inference and high overall throughput with minimal overhead. Experimental results on VPN-encrypted traffic and malware datasets show microsecond-level inference latency, multi-Tbps throughput, and classification accuracy exceeding state-of-the-art baselines, demonstrating the viability of FPGA-enhanced programmable switches for real-world, high-speed network analytics.

Abstract

Machine learning (ML) is increasingly used in network data planes for advanced traffic analysis, but existing solutions (such as FlowLens, N3IC, BoS) still struggle to simultaneously achieve low latency, high throughput, and high accuracy. To address these challenges, we present FENIX, a hybrid in-network ML system that performs feature extraction on programmable switch ASICs and deep neural network inference on FPGAs. FENIX introduces a Data Engine that leverages a probabilistic token bucket algorithm to control the sending rate of feature streams, effectively addressing the throughput gap between programmable switch ASICs and FPGAs. In addition, FENIX designs a Model Engine to enable high-accuracy deep neural network inference in the network, overcoming the difficulty of deploying complex models on resource-constrained switch chips. We implement FENIX on a programmable switch platform that integrates a Tofino ASIC and a ZU19EG FPGA directly, and evaluate it on real-world network traffic datasets. Our results show that FENIX achieves microsecond-level inference latency and multi-terabit throughput with low hardware overhead, and delivers over 90% accuracy on mainstream network traffic classification tasks, outperforming the state of the art.

Paper Structure

This paper contains 20 sections, 2 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Design space in Intelligent Network.
  • Figure 2: The architecture of FENIX.
  • Figure 3:
  • Figure 4: Details in Flow Tracker.
  • Figure 5: Workflow of Rate Limiter.
  • ...and 6 more figures