Retrieval-Guided Reinforcement Learning for Boolean Circuit Minimization
Animesh Basak Chowdhury, Marco Romanelli, Benjamin Tan, Ramesh Karri, Siddharth Garg
TL;DR
This work tackles logic-synthesis optimization where the quality depends on the synthesis recipe. It introduces ABC-RL, a retrieval-guided reinforcement-learning framework that adaptively blends a pre-trained policy with online MCTS via a tunable parameter $α$, based on how similar the new netlist is to training data. By using nearest-neighbor retrieval on GNN embeddings to compute $δ_{G_0}$ and a sigmoid-based mapping to $α$, ABC-RL consistently outperforms state-of-the-art ML methods and pure MCTS across MCNC and EPFL benchmarks, achieving up to $24.8\%$ QoR gains and up to $9\times$ iso-QoR speed-ups. The approach demonstrates robust improvements and highlights the potential of retrieval-guided RL to address distribution shift in design spaces, with practical impact for faster and higher-quality logic-synthesis in hardware design.
Abstract
Logic synthesis, a pivotal stage in chip design, entails optimizing chip specifications encoded in hardware description languages like Verilog into highly efficient implementations using Boolean logic gates. The process involves a sequential application of logic minimization heuristics (``synthesis recipe"), with their arrangement significantly impacting crucial metrics such as area and delay. Addressing the challenge posed by the broad spectrum of design complexities - from variations of past designs (e.g., adders and multipliers) to entirely novel configurations (e.g., innovative processor instructions) - requires a nuanced `synthesis recipe` guided by human expertise and intuition. This study conducts a thorough examination of learning and search techniques for logic synthesis, unearthing a surprising revelation: pre-trained agents, when confronted with entirely novel designs, may veer off course, detrimentally affecting the search trajectory. We present ABC-RL, a meticulously tuned $α$ parameter that adeptly adjusts recommendations from pre-trained agents during the search process. Computed based on similarity scores through nearest neighbor retrieval from the training dataset, ABC-RL yields superior synthesis recipes tailored for a wide array of hardware designs. Our findings showcase substantial enhancements in the Quality-of-result (QoR) of synthesized circuits, boasting improvements of up to 24.8% compared to state-of-the-art techniques. Furthermore, ABC-RL achieves an impressive up to 9x reduction in runtime (iso-QoR) when compared to current state-of-the-art methodologies.
