DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

Andrew B. Kahng; Zhiang Wang

DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

Andrew B. Kahng, Zhiang Wang

TL;DR

DG-RePlAce tackles global placement scalability for ML accelerators by introducing a GPU-accelerated framework that leverages dataflow and datapath regularities within OpenROAD. It integrates physical hierarchy extraction, dataflow-driven initial distribution, and datapath constraints into a parallel analytical placement flow, using virtual connections and pseudo nets to guide placement. Empirical results on Tabla/GeneSys show consistent improvements in routed wirelength and timing, with competitive runtimes and strong post-route gains on large benchmarks like TILOS, indicating the approach generalizes beyond ML accelerators. The work advances fast, high-quality placement by aligning layout with design dataflow, enabling faster design closure for modern datapath-rich accelerators.

Abstract

Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.

DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 17 sections, 5 equations, 12 figures, 8 tables, 1 algorithm.

Introduction
Preliminaries
Systolic Array Structure
Electrostatics-Based Placement
Dataflow-Driven Placement
Our Approach
Physical Hierarchy Extraction
Dataflow-Driven Initial Global Distribution
Datapath Constraints Construction
Parallel Analytical Placement
Experimental results
Results on Machine Learning Accelerators
Runtime Comparison Against DREAMPlace
Comparison with Hier-RTLMP
Ablation Study
...and 2 more sections

Figures (12)

Figure 1: Illustrative execution flow of a systolic array-based machine learning accelerator (figure reproduced from EsmaeilzadehGGGK21).
Figure 2: Overview of the proposed DG-RePlAce flow.
Figure 3: Dataflow visualization of the Tabla01 design EsmaeilzadehGGGK21.
Figure 4: Illustration of the bloat-shrink approach for reducing $cluster\_overflow$. Left: density overflow caused by overlap between clusters A and B; Right: removal of overlap by shrinking clusters A and B.
Figure 5: Datapath constraints construction on the 2D PE array.
...and 7 more figures

DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

TL;DR

Abstract

DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (12)