Table of Contents
Fetching ...

A Prototype-Based Framework to Design Scalable Heterogeneous SoCs with Fine-Grained DFS

Gabriele Montanaro, Andrea Galimberti, Davide Zoni

TL;DR

The paper addresses the challenge of rapid exploration and runtime optimization of large, heterogeneous SoCs on FPGA platforms. It introduces Vespa, an extension of the ESP framework, featuring multi-replica accelerator tiles, configurable DFS frequency islands, and a runtime monitoring infrastructure to support design-space exploration and run-time adaptation. Experimental results on a 4x4 tile Virtex-7 FPGA with CHStone accelerators show meaningful throughput scaling with replication (up to 3.58× for 4× replication) and demonstrate DFS-enabled traffic and memory behavior analysis. As an open-source toolchain, Vespa enables scalable prototyping and rapid DSE for large FPGA-based heterogeneous SoCs.

Abstract

Frameworks for the agile development of modern system-on-chips are crucial to dealing with the complexity of designing such architectures. The open-source Vespa framework for designing large, FPGA-based, multi-core heterogeneous system-on-chips enables a faster and more flexible design space exploration of such architectures and their run-time optimization. Vespa, built on ESP, introduces the capabilities to instantiate multiple replicas of the same accelerator in a single network-on-chip node and to partition the system-on-chips into frequency islands with independent dynamic frequency scaling actuators, as well as a dedicated run-time monitoring infrastructure. Experiments on 4-by-4 tile-based system-on-chips demonstrate the possibility of effectively exploring a multitude of solutions that differ in the replication of accelerators, the clock frequencies of the frequency islands, and the tiles' placement, as well as monitoring a variety of statistics related to the traffic on the interconnect and the accelerators' performance at run time.

A Prototype-Based Framework to Design Scalable Heterogeneous SoCs with Fine-Grained DFS

TL;DR

The paper addresses the challenge of rapid exploration and runtime optimization of large, heterogeneous SoCs on FPGA platforms. It introduces Vespa, an extension of the ESP framework, featuring multi-replica accelerator tiles, configurable DFS frequency islands, and a runtime monitoring infrastructure to support design-space exploration and run-time adaptation. Experimental results on a 4x4 tile Virtex-7 FPGA with CHStone accelerators show meaningful throughput scaling with replication (up to 3.58× for 4× replication) and demonstrate DFS-enabled traffic and memory behavior analysis. As an open-source toolchain, Vespa enables scalable prototyping and rapid DSE for large FPGA-based heterogeneous SoCs.

Abstract

Frameworks for the agile development of modern system-on-chips are crucial to dealing with the complexity of designing such architectures. The open-source Vespa framework for designing large, FPGA-based, multi-core heterogeneous system-on-chips enables a faster and more flexible design space exploration of such architectures and their run-time optimization. Vespa, built on ESP, introduces the capabilities to instantiate multiple replicas of the same accelerator in a single network-on-chip node and to partition the system-on-chips into frequency islands with independent dynamic frequency scaling actuators, as well as a dedicated run-time monitoring infrastructure. Experiments on 4-by-4 tile-based system-on-chips demonstrate the possibility of effectively exploring a multitude of solutions that differ in the replication of accelerators, the clock frequencies of the frequency islands, and the tiles' placement, as well as monitoring a variety of statistics related to the traffic on the interconnect and the accelerators' performance at run time.

Paper Structure

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Architecture of a generic Vespa SoC with multi-replica accelerator tiles and configurable-DFS frequency islands. Example with CPU, MEM, and MRA tiles in Frequency island m, I/O tile and interconnect in Frequency island n, resychronizers (Resync) at boundaries of frequency islands. Legend: M AXI master, S slave, K replication factor of MRA tile.
  • Figure 2: Floorplan of an instance of the Vespa SoC architecture. Legend: NoC in blue, I/O in violet, CPU in cyan, TGs in red, MEM in green, A1 (dfsin) in yellow, A2 (gsm) in orange.
  • Figure 3: Throughput of 4× -replication compute-bound (adpcm) and memory-bound (dfmul) accelerators placed in the A2 tile at different numbers of active TG cores.
  • Figure 4: Memory incoming traffic while varying at run time the clock frequencies of the islands including the A1 and A2 tiles, the NoC interconnect and MEM tile, and the TG tiles.