Table of Contents
Fetching ...

Demystifying FPGA Hard NoC Performance

Sihao Liu, Jake Ke, Tony Nowatzki, Jason Cong

TL;DR

This work characterizes hardened NoC performance on AMD Versal FPGAs, revealing when hard NoCs deliver net benefits for cross-SLR communication and memory access. By building a BenchNoC framework and evaluating multiple placements, topologies (VNoC/HNoC), and traffic patterns across DRAM and HBM interfaces, the study demonstrates that hard NoCs reduce cross-SLR overhead and timing closure while maintaining or degrading throughput depending on patterns and distances. Key findings show distance-sensitive read bandwidth, distance-insensitive write/stream bandwidth, and substantial compiler limitations for large, spread networks, alongside HBM bandwidth constraints that confine peak throughput to nearby controllers. The results provide practical guidelines for FPGA programmers and highlight trade-offs, such as vertical routing penalties and the need for more robust NoC compilers. Overall, the work offers actionable insights and an open-source characterization toolchain to guide hardened NoC integration in FPGA-based systems.

Abstract

With the advent of modern multi-chiplet FPGA architectures, vendors have begun integrating hardened NoC to address the scalability, resource usage, and frequency disadvantages of soft NoCs. However, as this work shows, effectively harnessing these hardened NoC is not trivial. It requires detailed knowledge of the microarchitecture and how it relates to the physical design of the FPGA. Existing literature has provided in-depth analyses for NoC in MPSoC devices, but few studies have systematically evaluated hardened NoC in FPGA, which have several unique implications. This work aims to bridge this knowledge gap by demystifying the performance and design trade-offs of hardened NoC on FPGA. Our work performs detailed performance analysis of hard (and soft) NoC under different settings, including diverse NoC topologies, routing strategies, traffic patterns and different external memories under various NoC placements. In the context of Versal FPGAs, our results show that using hardened NoC in multi-SLR designs can reduce expensive cross-SLR link usage by up to 30~40%, eliminate general-purpose logic overhead, and remove most critical paths caused by large on-chip crossbars. However, under certain aggressive traffic patterns, the frequency advantage of hardened NoC is outweighed by the inefficiency in the network microarchitecture. We also observe suboptimal solutions from the NoC compiler and distinct performance variations between the vertical and horizontal interconnects, underscoring the need for careful design. These findings serve as practical guidelines for effectively integrating hardened NoC and highlight important trade-offs for future FPGA-based systems.

Demystifying FPGA Hard NoC Performance

TL;DR

This work characterizes hardened NoC performance on AMD Versal FPGAs, revealing when hard NoCs deliver net benefits for cross-SLR communication and memory access. By building a BenchNoC framework and evaluating multiple placements, topologies (VNoC/HNoC), and traffic patterns across DRAM and HBM interfaces, the study demonstrates that hard NoCs reduce cross-SLR overhead and timing closure while maintaining or degrading throughput depending on patterns and distances. Key findings show distance-sensitive read bandwidth, distance-insensitive write/stream bandwidth, and substantial compiler limitations for large, spread networks, alongside HBM bandwidth constraints that confine peak throughput to nearby controllers. The results provide practical guidelines for FPGA programmers and highlight trade-offs, such as vertical routing penalties and the need for more robust NoC compilers. Overall, the work offers actionable insights and an open-source characterization toolchain to guide hardened NoC integration in FPGA-based systems.

Abstract

With the advent of modern multi-chiplet FPGA architectures, vendors have begun integrating hardened NoC to address the scalability, resource usage, and frequency disadvantages of soft NoCs. However, as this work shows, effectively harnessing these hardened NoC is not trivial. It requires detailed knowledge of the microarchitecture and how it relates to the physical design of the FPGA. Existing literature has provided in-depth analyses for NoC in MPSoC devices, but few studies have systematically evaluated hardened NoC in FPGA, which have several unique implications. This work aims to bridge this knowledge gap by demystifying the performance and design trade-offs of hardened NoC on FPGA. Our work performs detailed performance analysis of hard (and soft) NoC under different settings, including diverse NoC topologies, routing strategies, traffic patterns and different external memories under various NoC placements. In the context of Versal FPGAs, our results show that using hardened NoC in multi-SLR designs can reduce expensive cross-SLR link usage by up to 30~40%, eliminate general-purpose logic overhead, and remove most critical paths caused by large on-chip crossbars. However, under certain aggressive traffic patterns, the frequency advantage of hardened NoC is outweighed by the inefficiency in the network microarchitecture. We also observe suboptimal solutions from the NoC compiler and distinct performance variations between the vertical and horizontal interconnects, underscoring the need for careful design. These findings serve as practical guidelines for effectively integrating hardened NoC and highlight important trade-offs for future FPGA-based systems.

Paper Structure

This paper contains 36 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Versal FPGA Network-on-Chip Overview. The NoC is non-uniform along vertical (VNoC) and horizontal (HNoC) dimensions. The VNoC bridges SLR boundaries and reduces need for cross-die bandwidth/timing challenges. The HNoC is provisioned for connecting external memory interfaces such as DRAM or HBM controllers.
  • Figure 2: NoC Architecture Details and Routing Schemes
  • Figure 3: Different Location Setups with Floorplan
  • Figure 4: BenchNoC Toolchain and NoC configuration example
  • Figure 5: Heat maps of (a) AXI-MM read-only and (b) write-only/AXI-S throughput from corner source locations (green dot) to all possible destinations on an VP1802. Warmer (red/orange) hues indicate lower throughput, while cooler (green) hues represent higher throughput.
  • ...and 8 more figures