RLPlanner: Reinforcement Learning based Floorplanning for Chiplets with Fast Thermal Analysis
Yuanyuan Duan, Xingchen Liu, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu
TL;DR
RLPlanner tackles the challenge of thermal-aware floorplanning for chiplet-based 2.5D systems by integrating a fast, physics-informed thermal model with reinforcement learning. It uses a PPO-based agent with an RND exploration bonus to jointly minimize total wirelength and maximum temperature, enabled by a three-part architecture consisting of the environment, policy/value networks, and a thermal reward calculator. The fast thermal model achieves MAE around $0.25$ K and more than 127x speedup over HotSpot, enabling efficient end-to-end optimization. On benchmarks and synthetic systems, RLPlanner delivers about 20.3 percent improvement in the combined objective over TAP-2.5D with HotSpot (and about 9.25 percent over the fast model baselines) under similar runtimes, demonstrating practical impact for thermal-aware chiplet floorplanning.
Abstract
Chiplet-based systems have gained significant attention in recent years due to their low cost and competitive performance. As the complexity and compactness of a chiplet-based system increase, careful consideration must be given to microbump assignments, interconnect delays, and thermal limitations during the floorplanning stage. This paper introduces RLPlanner, an efficient early-stage floorplanning tool for chiplet-based systems with a novel fast thermal evaluation method. RLPlanner employs advanced reinforcement learning to jointly minimize total wirelength and temperature. To alleviate the time-consuming thermal calculations, RLPlanner incorporates the developed fast thermal evaluation method to expedite the iterations and optimizations. Comprehensive experiments demonstrate that our proposed fast thermal evaluation method achieves a mean absolute error (MAE) of 0.25 K and delivers over 120x speed-up compared to the open-source thermal solver HotSpot. When integrated with our fast thermal evaluation method, RLPlanner achieves an average improvement of 20.28\% in minimizing the target objective (a combination of wirelength and temperature), within a similar running time, compared to the classic simulated annealing method with HotSpot.
