Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning
Kaniz Mishty, Mehdi Sadi
TL;DR
This work tackles the PPAC optimization challenge for chiplet-based AI accelerators by formulating a co-design framework, Chiplet-Gym, that integrates an analytical PPAC model into an OpenAI Gym environment and optimizes design points using reinforcement learning (PPO) alongside simulated annealing. It explores a vast design space spanning 2.5D and 5.5D packaging, chiplet allocation, and placement, and validates the approach with MLPerf benchmarks, showing that a 3D-stacked chiplet configuration can deliver up to $1.52\times$ higher throughput, $0.27\times$ energy, and $0.01\times$ die cost at iso-area, with packaging costs around $1.62\times$ monolithic baselines. The methodology combines detailed physical and economic models (yield, inter-chiplet latency, bandwidth, energy, and packaging cost) with robust optimization by running multiple RL seeds and SA runs to ensure near-global optima. The results highlight the practical impact of chiplet-based AI accelerators, offering substantial performance and energy efficiency gains while mitigating manufacturing costs through co-design and packaging innovations.
Abstract
Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves 1.52X throughput, 0.27X energy, and 0.01X die cost while incurring only 1.62X package cost of its monolithic counterpart at iso-area.
