Table of Contents
Fetching ...

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets

Zhuoping Yang, Shixin Ji, Xingzhen Chen, Jinming Zhuang, Weifeng Zhang, Dharmesh Jani, Peipei Zhou

TL;DR

This paper surveys the challenges and opportunities of enabling large-scale computing through heterogeneous chiplets. It analyzes how diverse AI workloads drive distinct resource demands and argues that chiplet-based systems can accelerate time-to-market and reduce costs by disaggregating monolithic dies, while also introducing design, packaging, security, and software challenges. The authors discuss a wide range of hardware design issues (chiplet interfaces, interposers, pre-silicon simulators, packaging) and software challenges (unified programming models, runtimes, and tooling), and offer potential solutions such as standardized interconnects (e.g., UCIe), active interposers, TEEs, and MLIR-based programming frameworks. The work highlights the importance of integrated solutions and standards to harness chiplets for scalable AI workloads, providing a roadmap for researchers and industry to pursue secure, efficient, and programmable heterogeneous systems.

Abstract

Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems.

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets

TL;DR

This paper surveys the challenges and opportunities of enabling large-scale computing through heterogeneous chiplets. It analyzes how diverse AI workloads drive distinct resource demands and argues that chiplet-based systems can accelerate time-to-market and reduce costs by disaggregating monolithic dies, while also introducing design, packaging, security, and software challenges. The authors discuss a wide range of hardware design issues (chiplet interfaces, interposers, pre-silicon simulators, packaging) and software challenges (unified programming models, runtimes, and tooling), and offer potential solutions such as standardized interconnects (e.g., UCIe), active interposers, TEEs, and MLIR-based programming frameworks. The work highlights the importance of integrated solutions and standards to harness chiplets for scalable AI workloads, providing a roadmap for researchers and industry to pursue secure, efficient, and programmable heterogeneous systems.

Abstract

Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems.
Paper Structure (14 sections, 1 equation, 1 figure, 2 tables)