Table of Contents
Fetching ...

SAF: Scalable Acceleration Framework for dynamic and flexible scaling of FPGAs

Masudul Hassan Quraishi, Michael Riera, Fengbo Ren, Aman Arora, Aviral Shrivastava

TL;DR

This paper tackles the scaling bottleneck of FPGA deployments by introducing SAF, an Ethernet-based framework that enables hot-plug, stand-alone FPGAs to connect to a remote host without a local CPU. SAF employs a custom shell and standalone accelerator protocols to support automatic discovery, multi-FPGA partial reconfiguration, memory management, and kernel execution over Ethernet. Empirical results with 20 Arria-10 FPGAs show SAF delivers up to $13×$ faster reconfiguration, $21 ext{%-}38 ext{%}$ lower setup costs, and nearly linear performance scaling, along with $25 ext{%}$ runtime and $27 ext{%}$ energy reductions in on-demand scaling scenarios. The approach offers a practical path to scalable, cost-effective FPGA acceleration for cloud and edge workloads, leveraging remote hosting and network-based orchestration.

Abstract

FPGAs are increasingly gaining traction in cloud and edge computing environments due to their hardware flexibility, low latency, and low energy consumption. However, the existing hardware stack of FPGA and the host-FPGA connectivity does not allow flexible scaling and simultaneous reconfiguration of multiple devices, which limits the adoption of FPGA at scale. In this paper, we present SAF -- an Ethernet-based scalable acceleration framework that allows FPGA to be hot-plugged into a network in a stand-alone fashion without connecting to a local host CPU, which enables flexible scalability. SAF provides a custom FPGA shell and a set of Ethernet protocols that allow FPGAs to connect with a remote host to accelerate application kernels. SAF can configure multiple FPGAs simultaneously, which significantly reduces the reconfiguration time in scaling effort. We implemented the SAF framework using Intel FPGA SDK for OpenCL and 20 Bittware 385A cards with Arria-10 FPGAs. We analyze a case study and conduct experiments to compare SAF with state-of-the-art multi-FPGA clusters. Results show that SAF provides 13X faster reconfiguration than sequential PCIe programming, reduces the hardware setup costs by 38%, application runtime by 25%, and energy consumption by 27%. We evaluated the performance scalability of SAF using the PTRANS benchmark of the HPCC FPGA benchmark suite and showed an almost linear speedup for strong and weak scaling scenarios.

SAF: Scalable Acceleration Framework for dynamic and flexible scaling of FPGAs

TL;DR

This paper tackles the scaling bottleneck of FPGA deployments by introducing SAF, an Ethernet-based framework that enables hot-plug, stand-alone FPGAs to connect to a remote host without a local CPU. SAF employs a custom shell and standalone accelerator protocols to support automatic discovery, multi-FPGA partial reconfiguration, memory management, and kernel execution over Ethernet. Empirical results with 20 Arria-10 FPGAs show SAF delivers up to faster reconfiguration, lower setup costs, and nearly linear performance scaling, along with runtime and energy reductions in on-demand scaling scenarios. The approach offers a practical path to scalable, cost-effective FPGA acceleration for cloud and edge workloads, leveraging remote hosting and network-based orchestration.

Abstract

FPGAs are increasingly gaining traction in cloud and edge computing environments due to their hardware flexibility, low latency, and low energy consumption. However, the existing hardware stack of FPGA and the host-FPGA connectivity does not allow flexible scaling and simultaneous reconfiguration of multiple devices, which limits the adoption of FPGA at scale. In this paper, we present SAF -- an Ethernet-based scalable acceleration framework that allows FPGA to be hot-plugged into a network in a stand-alone fashion without connecting to a local host CPU, which enables flexible scalability. SAF provides a custom FPGA shell and a set of Ethernet protocols that allow FPGAs to connect with a remote host to accelerate application kernels. SAF can configure multiple FPGAs simultaneously, which significantly reduces the reconfiguration time in scaling effort. We implemented the SAF framework using Intel FPGA SDK for OpenCL and 20 Bittware 385A cards with Arria-10 FPGAs. We analyze a case study and conduct experiments to compare SAF with state-of-the-art multi-FPGA clusters. Results show that SAF provides 13X faster reconfiguration than sequential PCIe programming, reduces the hardware setup costs by 38%, application runtime by 25%, and energy consumption by 27%. We evaluated the performance scalability of SAF using the PTRANS benchmark of the HPCC FPGA benchmark suite and showed an almost linear speedup for strong and weak scaling scenarios.

Paper Structure

This paper contains 23 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The system overview and benefits of SAF (right) compared to a representative SOTA multi-FPGA cluster (Left). The cluster uses a hybrid of host-FPGA indirect network and FPGA-FPGA direct network, which provides superior performance but lacks flexibility in scaling. SAF only uses an indirect network with a remote host and provides hot plug integration of FPGAs, which enables flexible scalability.
  • Figure 2: The high-level architecture of SAF showing the key components of the framework. The SAF custom shell and the kernels in the role are designed in such a way that they can communicate with the remote host application using standalone accelerator protocols.
  • Figure 3: The micro-architecture of SAF custom shell showing the interconnection between module interfaces, logic blocks, and kernels. The SAF custom shell analyzes and routes the payload data from the Ethernet packets to appropriate interfaces. The control kernels complement the custom shell to enable compliance with the standalone accelerator protocols.
  • Figure 4: Ethernet Auto-Discovery Packet sent by FPGAs to the remote host. Using this packet, the FPGAs can announce their presence on the network by sharing their unique MAC addresses, vendor IDs, and product IDs.
  • Figure 5: Execution flow diagram showing the sequence of host-FPGA communications using the standalone accelerator protocols listed in Table \ref{['tab:packet']}. This flow gives an overview of how SAF enables application acceleration on FPGAs.
  • ...and 2 more figures