Table of Contents
Fetching ...

Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond

Paul Scheffler, Thomas Benz, Tim Fischer, Lorenzo Leone, Sina Arjmandpour, Luca Benini

TL;DR

This work addresses the challenge of bridging the performance gap between open-source chiplet designs and proprietary HPC/AI silicon by proposing a concrete roadmap built around open 2.5D RISC-V manycores. Starting with Occamy, a silicon-proven dual-chiplet system in 12 nm with 432 cores and a hierarchical crossbar, the authors demonstrate baseline compute density and identify interconnect limitations. They then scale to Ramora with a scalable 2D mesh NoC, achieving a 1.29T DPS peak and higher bandwidth utilization, and finally conceptualize Ogopogo, a quad-chiplet design in 7 nm with HBM3 that delivers 10.3DPTs and a node-normalized compute density 19% above Nvidia’s B200. The paper also discusses end-to-end openness, outlining open simulation, EDA, and PDK challenges that must be addressed to realize fully open chiplet ecosystems. Collectively, the results indicate that open-source 2.5D designs can reach competitive HPC/AI performance, while identifying practical bottlenecks toward end-to-end openness.

Abstract

We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs. Starting with Occamy, the first open, silicon-proven dual-chiplet RISC-V manycore in 12nm FinFET, we scale to Ramora, a mesh-NoC-based dual-chiplet system, and to Ogopogo, a 7nm quad-chiplet concept architecture achieving state-of-the-art compute density. Finally, we explore possible avenues to extend openness beyond logic-core RTL into simulation, EDA, PDKs, and off-die PHYs.

Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond

TL;DR

This work addresses the challenge of bridging the performance gap between open-source chiplet designs and proprietary HPC/AI silicon by proposing a concrete roadmap built around open 2.5D RISC-V manycores. Starting with Occamy, a silicon-proven dual-chiplet system in 12 nm with 432 cores and a hierarchical crossbar, the authors demonstrate baseline compute density and identify interconnect limitations. They then scale to Ramora with a scalable 2D mesh NoC, achieving a 1.29T DPS peak and higher bandwidth utilization, and finally conceptualize Ogopogo, a quad-chiplet design in 7 nm with HBM3 that delivers 10.3DPTs and a node-normalized compute density 19% above Nvidia’s B200. The paper also discusses end-to-end openness, outlining open simulation, EDA, and PDK challenges that must be addressed to realize fully open chiplet ecosystems. Collectively, the results indicate that open-source 2.5D designs can reach competitive HPC/AI performance, while identifying practical bottlenecks toward end-to-end openness.

Abstract

We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs. Starting with Occamy, the first open, silicon-proven dual-chiplet RISC-V manycore in 12nm FinFET, we scale to Ramora, a mesh-NoC-based dual-chiplet system, and to Ogopogo, a 7nm quad-chiplet concept architecture achieving state-of-the-art compute density. Finally, we explore possible avenues to extend openness beyond logic-core RTL into simulation, EDA, PDKs, and off-die PHYs.

Paper Structure

This paper contains 16 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: Overview of the openness-performance tradeoff, exemplified by various open-source systems including those discussed in this paper.
  • Figure 2: Wk. core
  • Figure 3: Compute cluster
  • Figure 4: Chiplet and system
  • Figure 6: Cluster layout
  • ...and 13 more figures