Table of Contents
Fetching ...

FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network

Timo Schneider, Pengcheng Xu, Torsten Hoefler

TL;DR

FPsPIN delivers an open-source, FPGA-based full-system implementation of the sPIN model to enable end-to-end experiments on smart NICs, addressing the latency and data-processing challenges of high-rate networks. By integrating PsPIN with Corundum, FPsPIN provides programmable on-NIC packet handling through header-, packet-, and tail-handlers that operate on host-memory DMA, with a tailored FPGA specialization that fits timing constraints. Demonstrations with ICMP/UDP ping-pong, reliable SLMP transfer, and MPI-derived datatype offload show substantial end-to-end benefits and high overlap between NIC offload and host computation ($\approx 96\%$, $98\%$ in different scenarios), while also identifying limitations and opportunities for hardware and software optimizations. The platform is positioned as a cost-effective, portable research tool to explore which NIC-offload strategies yield real-world benefits in HPC and data-center environments, and to guide future sNIC design, virtualization, and multi-tenancy considerations.

Abstract

In the era of post-Moore computing, network offload emerges as a solution to two challenges: the imperative for low-latency communication and the push towards hardware specialisation. Various methods have been employed to offload protocol- and data-processing onto network interface cards (NICs), from firmware modification to running full Linux on NICs for application execution. The sPIN project enables users to define handlers executed upon packet arrival. While simulations show sPIN's potential across diverse workloads, a full-system evaluation is lacking. This work presents FPsPIN, a full FPGA-based implementation of sPIN. FPsPIN is showcased through offloaded MPI datatype processing, achieving a 96% overlap ratio. FPsPIN provides an adaptable open-source research platform for researchers to conduct end-to-end experiments on smart NICs.

FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network

TL;DR

FPsPIN delivers an open-source, FPGA-based full-system implementation of the sPIN model to enable end-to-end experiments on smart NICs, addressing the latency and data-processing challenges of high-rate networks. By integrating PsPIN with Corundum, FPsPIN provides programmable on-NIC packet handling through header-, packet-, and tail-handlers that operate on host-memory DMA, with a tailored FPGA specialization that fits timing constraints. Demonstrations with ICMP/UDP ping-pong, reliable SLMP transfer, and MPI-derived datatype offload show substantial end-to-end benefits and high overlap between NIC offload and host computation (, in different scenarios), while also identifying limitations and opportunities for hardware and software optimizations. The platform is positioned as a cost-effective, portable research tool to explore which NIC-offload strategies yield real-world benefits in HPC and data-center environments, and to guide future sNIC design, virtualization, and multi-tenancy considerations.

Abstract

In the era of post-Moore computing, network offload emerges as a solution to two challenges: the imperative for low-latency communication and the push towards hardware specialisation. Various methods have been employed to offload protocol- and data-processing onto network interface cards (NICs), from firmware modification to running full Linux on NICs for application execution. The sPIN project enables users to define handlers executed upon packet arrival. While simulations show sPIN's potential across diverse workloads, a full-system evaluation is lacking. This work presents FPsPIN, a full FPGA-based implementation of sPIN. FPsPIN is showcased through offloaded MPI datatype processing, achieving a 96% overlap ratio. FPsPIN provides an adaptable open-source research platform for researchers to conduct end-to-end experiments on smart NICs.
Paper Structure (16 sections, 10 figures, 4 tables)

This paper contains 16 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Our FPsPIN prototype system and a performance comparison of this work and its predecessor PsPIN pspin.
  • Figure 2: The sPIN abstract machine model
  • Figure 3: The PULP sPIN architecture
  • Figure 4: Corundum architecture overview from the corundum documentation corundum_doc, the app block, in which FPsPIN is implemented, is highlighted in yellow.
  • Figure 5: FPsPIN architecture overview
  • ...and 5 more figures