Analytical Heterogeneous Die-to-Die 3D Placement with Macros
Yuxuan Zhao, Peiyu Liao, Siting Liu, Jiaxi Jiang, Yibo Lin, Bei Yu
TL;DR
The paper introduces an analytical framework for 3D mixed-size placement in heterogeneous F2F bonded 3D ICs, coupling a dedicated density model with a bistratal wirelength formulation and a novel 3D preconditioner to handle macros and standard cells in a 3D space. A MILP-based macro-rotation assignment and GPU-accelerated implementation enable fast convergence and high-quality results, demonstrated by a 5.9% quality-score improvement over the ICCAD 2023 first place with 4.0x speedup, and further validation on modern RISC-V designs showing substantial wirelength reductions and large runtime gains. The flow combines 3D mixed-size global placement, macro-rotation optimization, multi-die 2D placement, and legalization/detailed placement, with adaptive 3D density accumulation and a 3D prefix-sum technique to efficiently handle macro density. Overall, the approach advances efficient, scalable placement for heterogeneous 3D ICs by explicitly modeling die-to-die interfaces, HBTs, and macro interactions in a GPU-accelerated, optimization-driven pipeline.
Abstract
This paper presents an innovative approach to 3D mixed-size placement in heterogeneous face-to-face (F2F) bonded 3D ICs. We propose an analytical framework that utilizes a dedicated density model and a bistratal wirelength model, effectively handling macros and standard cells in a 3D solution space. A novel 3D preconditioner is developed to resolve the topological and physical gap between macros and standard cells. Additionally, we propose a mixed-integer linear programming (MILP) formulation for macro rotation to optimize wirelength. Our framework is implemented with full-scale GPU acceleration, leveraging an adaptive 3D density accumulation algorithm and an incremental wirelength gradient algorithm. Experimental results on ICCAD 2023 contest benchmarks demonstrate that our framework can achieve 5.9% quality score improvement compared to the first-place winner with 4.0x runtime speedup. Additional experiments on modern RISC-V designs further validate the generalizability and superiority of our framework.
