Table of Contents
Fetching ...

Open-Source Heterogeneous SoCs for AI: The PULP Platform Experience

Francesco Conti, Angelo Garofalo, Davide Rossi, Giuseppe Tagliavini, Luca Benini

TL;DR

This article focuses on the PULP experience designing heterogeneous AI acceleration SoCs - an endeavour encompassing SoC architecture definition; development, verification, and integration of acceleration IPs; front- and back-end VLSI design; testing; development of AI deployment software.

Abstract

Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor cores to network-on-chips, peripherals, SoC templates, and full hardware accelerators. In this article, we focus on the PULP experience designing heterogeneous AI acceleration SoCs - an endeavour encompassing SoC architecture definition; development, verification, and integration of acceleration IPs; front- and back-end VLSI design; testing; development of AI deployment software.

Open-Source Heterogeneous SoCs for AI: The PULP Platform Experience

TL;DR

This article focuses on the PULP experience designing heterogeneous AI acceleration SoCs - an endeavour encompassing SoC architecture definition; development, verification, and integration of acceleration IPs; front- and back-end VLSI design; testing; development of AI deployment software.

Abstract

Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor cores to network-on-chips, peripherals, SoC templates, and full hardware accelerators. In this article, we focus on the PULP experience designing heterogeneous AI acceleration SoCs - an endeavour encompassing SoC architecture definition; development, verification, and integration of acceleration IPs; front- and back-end VLSI design; testing; development of AI deployment software.
Paper Structure (10 sections, 9 figures)

This paper contains 10 sections, 9 figures.

Figures (9)

  • Figure 1: Example of the difference between a model of SoC design based on closed-source IP and one exploiting an open model. Exploiting an open-source model lowers the non-recurrent engineering costs related to the design of IPs that are not associated with most of a SoC's value, such as key proprietary IPs developed by a startup, lowering the access barriers and freeing up funding for the development of the high-value proprietary IPs.
  • Figure 2: Left: template of a heterogeneous PULP cluster. Right: internal organization of an HWPE streamer and controller blocks.
  • Figure 3: Xpulpnn datapath integrated in RI5CY in the Marsellus SoC. The special-purpose NN-RF register file is fed by the load-store unit and used to feed the dot-product unit without impacting the general purpose register file (GP-RF), which enables to considerably improve internal data reuse.
  • Figure 4: Example of two HWPE architectures. Left: N-EUREKA, dedicated to extreme-edge quantized AI; right: RedMulE, dedicated to high-performance edge inference & training.
  • Figure 5: Analysis of a selection of PULP chips taped out between 2015 and 2022, exploiting architectural heterogeneity as ISA extensions (red) or HWPEs (blue). A) Microphotographs of the taped-out prototypes (in-scale with one another); B) Performance (Gop/s), power (mW), and energy efficiency of the considered PULP SoCs in their respective highest performance and energy efficiency operating points; C) Peak performance (Gop/s) versus SoC area devoted to computation; D) Normalized performance per bit (e.g., 1 Gop/s @ 4b-weight, 8b-input precision = 32 Gbop/s) versus power (mW) in the highest performance operating point.
  • ...and 4 more figures