Table of Contents
Fetching ...

Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer

Ke Xue, Ruo-Tong Chen, Xi Lin, Yunqi Shi, Shixiong Kai, Siyuan Xu, Chao Qian

TL;DR

This work reframes reinforcement learning for chip macro placement as a regulator that refines existing layouts rather than placing macros from scratch, enabling richer state information and denser rewards. By integrating a RegularMask that captures placement regularity and combining it with wirelength-based rewards, MaskRegulate achieves substantial PPA improvements and superior regularity across ICCAD 2015 benchmarks, with demonstrated generalization to unseen chips. The method leverages grid-based state representations and PPO, and is validated against strong baselines using proxy metrics and commercial tools, highlighting practical impact for industrial chip design. Overall, the regulator-based RL approach advances efficient, generalizable optimization in placement and suggests a promising direction for RL in manufacturing-oriented design tasks.

Abstract

In modern chip design, placement aims at placing millions of circuit modules, which is an essential step that significantly influences power, performance, and area (PPA) metrics. Recently, reinforcement learning (RL) has emerged as a promising technique for improving placement quality, especially macro placement. However, current RL-based placement methods suffer from long training times, low generalization ability, and inability to guarantee PPA results. A key issue lies in the problem formulation, i.e., using RL to place from scratch, which results in limits useful information and inaccurate rewards during the training process. In this work, we propose an approach that utilizes RL for the refinement stage, which allows the RL policy to learn how to adjust existing placement layouts, thereby receiving sufficient information for the policy to act and obtain relatively dense and precise rewards. Additionally, we introduce the concept of regularity during training, which is considered an important metric in the chip design industry but is often overlooked in current RL placement methods. We evaluate our approach on the ISPD 2005 and ICCAD 2015 benchmark, comparing the global half-perimeter wirelength and regularity of our proposed method against several competitive approaches. Besides, we test the PPA performance using commercial software, showing that RL as a regulator can achieve significant PPA improvements. Our RL regulator can fine-tune placements from any method and enhance their quality. Our work opens up new possibilities for the application of RL in placement, providing a more effective and efficient approach to optimizing chip design. Our code is available at \url{https://github.com/lamda-bbo/macro-regulator}.

Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer

TL;DR

This work reframes reinforcement learning for chip macro placement as a regulator that refines existing layouts rather than placing macros from scratch, enabling richer state information and denser rewards. By integrating a RegularMask that captures placement regularity and combining it with wirelength-based rewards, MaskRegulate achieves substantial PPA improvements and superior regularity across ICCAD 2015 benchmarks, with demonstrated generalization to unseen chips. The method leverages grid-based state representations and PPO, and is validated against strong baselines using proxy metrics and commercial tools, highlighting practical impact for industrial chip design. Overall, the regulator-based RL approach advances efficient, generalizable optimization in placement and suggests a promising direction for RL in manufacturing-oriented design tasks.

Abstract

In modern chip design, placement aims at placing millions of circuit modules, which is an essential step that significantly influences power, performance, and area (PPA) metrics. Recently, reinforcement learning (RL) has emerged as a promising technique for improving placement quality, especially macro placement. However, current RL-based placement methods suffer from long training times, low generalization ability, and inability to guarantee PPA results. A key issue lies in the problem formulation, i.e., using RL to place from scratch, which results in limits useful information and inaccurate rewards during the training process. In this work, we propose an approach that utilizes RL for the refinement stage, which allows the RL policy to learn how to adjust existing placement layouts, thereby receiving sufficient information for the policy to act and obtain relatively dense and precise rewards. Additionally, we introduce the concept of regularity during training, which is considered an important metric in the chip design industry but is often overlooked in current RL placement methods. We evaluate our approach on the ISPD 2005 and ICCAD 2015 benchmark, comparing the global half-perimeter wirelength and regularity of our proposed method against several competitive approaches. Besides, we test the PPA performance using commercial software, showing that RL as a regulator can achieve significant PPA improvements. Our RL regulator can fine-tune placements from any method and enhance their quality. Our work opens up new possibilities for the application of RL in placement, providing a more effective and efficient approach to optimizing chip design. Our code is available at \url{https://github.com/lamda-bbo/macro-regulator}.

Paper Structure

This paper contains 25 sections, 1 equation, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Placement layouts and congestions of (a) MaskPlace and (b) MaskRegulate on the superblue1 from ICCAD 2015 benchmark iccad15, where the red points indicate the congestion critical regions. (c): Comparing two crucial PPA metrics, namely Congestion and total negative slack (TNS) between MaskRegulate, DREAMPlace lin2020dreamplace, AutoDMP agnesina2023autodmp, WireMask-BBO wiremask-bbo, and MaskPlace lai2022maskplace, where lower values indicate better performance.
  • Figure 2: Overview of MaskRegulate. MaskRegulate shares a similar architecture to MaskPlace lai2022maskplace, except for the MDP formulation and the integration of regularity in the state and reward.
  • Figure 3: Illustration of chip canvas, PositionMask, WireMask and RegularMask. We use the left-bottom corner of the module to denotes its location.
  • Figure 4: Illustration of MaskRegulate regulators with varying $\alpha$ values (ranging from 0.1 to 0.9).
  • Figure 5: Placement layouts and congestions of different methods on the eight ICCAD 2015 benchmarks. The congestion results are obtained by Cadence Innovus, where red points indicate the congestion critical regions.
  • ...and 1 more figures