DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators
Andrew B. Kahng, Zhiang Wang
TL;DR
DG-RePlAce tackles global placement scalability for ML accelerators by introducing a GPU-accelerated framework that leverages dataflow and datapath regularities within OpenROAD. It integrates physical hierarchy extraction, dataflow-driven initial distribution, and datapath constraints into a parallel analytical placement flow, using virtual connections and pseudo nets to guide placement. Empirical results on Tabla/GeneSys show consistent improvements in routed wirelength and timing, with competitive runtimes and strong post-route gains on large benchmarks like TILOS, indicating the approach generalizes beyond ML accelerators. The work advances fast, high-quality placement by aligning layout with design dataflow, enabling faster design closure for modern datapath-rich accelerators.
Abstract
Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.
