Table of Contents
Fetching ...

Towards Generalized On-Chip Communication for Programmable Accelerators in Heterogeneous Architectures

Joseph Zuckerman, John-David Wellman, Ajay Vanamali, Manish Shankar, Gabriele Tombesi, Karthik Swaminathan, Kevin Lee, Mohit Kapur, Robert Philhower, Pradip Bose, Luca P. Carloni

TL;DR

The paper tackles the challenge of flexible, efficient on-chip communication for programmable accelerators in heterogeneous SoCs. It introduces hardware-based ESP enhancements including flexible per-burst P2P, a multicast NoC, coherence-based synchronization, an expanded accelerator interface, and an ISA extension for DMA control. The contributions are demonstrated through FPGA-based validation, showing modest area overheads (especially for multicast) and substantial speedups (up to 203% with 16 destinations on large data workloads). The work enables tighter data forwarding and synchronization across accelerators with minimal changes to accelerators and plans for integration into the ESP mainline release, benefiting future heterogeneous architectures with programmable accelerators.

Abstract

We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leveraging the SoC's coherence protocol, 4) an accelerator interface that offers fine-grained control over the communication mode used, and 5) an example ISA extension to support our enhancements. Our solution adds negligible area to the SoC architecture and requires minimal changes to the accelerators themselves. We have validated most of these features in complex FPGA prototypes and plan to include them in the open-source release of ESP in the coming months.

Towards Generalized On-Chip Communication for Programmable Accelerators in Heterogeneous Architectures

TL;DR

The paper tackles the challenge of flexible, efficient on-chip communication for programmable accelerators in heterogeneous SoCs. It introduces hardware-based ESP enhancements including flexible per-burst P2P, a multicast NoC, coherence-based synchronization, an expanded accelerator interface, and an ISA extension for DMA control. The contributions are demonstrated through FPGA-based validation, showing modest area overheads (especially for multicast) and substantial speedups (up to 203% with 16 destinations on large data workloads). The work enables tighter data forwarding and synchronization across accelerators with minimal changes to accelerators and plans for integration into the ESP mainline release, benefiting future heterogeneous architectures with programmable accelerators.

Abstract

We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leveraging the SoC's coherence protocol, 4) an accelerator interface that offers fine-grained control over the communication mode used, and 5) an example ISA extension to support our enhancements. Our solution adds negligible area to the SoC architecture and requires minimal changes to the accelerators themselves. We have validated most of these features in complex FPGA prototypes and plan to include them in the open-source release of ESP in the coming months.
Paper Structure (5 sections, 6 figures)

This paper contains 5 sections, 6 figures.

Figures (6)

  • Figure 1: Three distinct data access modes for an accelerator in a 3x3 tile heterogeneous SoC.
  • Figure 2: The ESP accelerator socket with an instance of a programmable accelerator.
  • Figure 3: Signals of the 4 latency-insensitive channels of the ESP accelerator interface.
  • Figure 4: Area of a single NoC router with different bitwidths and maximum multicast destinations.
  • Figure 5: Evaluated 3x4 SoC with 1 CPU tile, 1 Memory tile, 1 IO tile, and 17 traffic generator accelerators.
  • ...and 1 more figures