Critical Path Aware Timing-Driven Global Placement for Large-Scale Heterogeneous FPGAs
He Jiang, Yi Guo, Shikai Guo, Huijiang Liu, Xiaochen Li, Ning Wang, Zhixiong Di
TL;DR
TD-Placer delivers a timing-driven global placement framework for large-scale heterogeneous FPGAs by jointly modeling net topology, delay, and routing context within a graph-based representation. It introduces a lightweight static timing analysis and a depth- and timing-aware weighting scheme to steer placement toward critical paths while maintaining wirelength efficiency, yielding consistent CPD and WNS improvements over state-of-the-art placers and parity with recent Vivado versions. Extensive experiments on seven open FPGA designs show about 4–5% CPD reductions and up to 12% WNS improvements, with end-to-end delay predictions achieving strong accuracy (MAE 0.097 ns, RMSE 0.171 ns, R^2 0.80). The approach is open-sourced, enabling practical reuse and adaptation to other FPGA targets and design flows.
Abstract
Timing optimization during global placement is critical for achieving optimal circuit performance and remains a key challenge in modern Field Programmable Gate Array (FPGA) design. As FPGA designs scale and heterogeneous resources increase, dense interconnects introduce significant resistive and capacitive effects, making timing closure increasingly difficult. Existing methods face challenges in constructing accurate timing models due to multi-factor nonlinear constraints as well as load and crosstalk coupling effects arising in multi-pin driving scenarios. To address these challenges, we propose TD-Placer, a critical path aware, timing-driven global placement framework. It leverages graph-based representations to capture global net interactions and employs a nonlinear model to integrate diverse timing-related features for precise delay prediction, thereby improving the overall placement quality for FPGAs. TD-Placer adopts a quadratic placement objective that minimizes wirelength while incorporating a timing term constructed by a lightweight algorithm, enabling efficient and high-quality timing optimization. Regarding net-level timing contention, it also employs a finer-grained weighting scheme to facilitate smooth reduction of the Critical Path Delay (CPD). Extensive experiments were carried out on seven real-world open-source FPGA projects with LUT counts ranging from 60K to 400K. The results demonstrate that TD-Placer achieves an average 10% improvement in Worst Negative Slack (WNS) and a 5% reduction in CPD compared to the state-of-the-art method, with an average CPD comparable (*1.01) to the commercial AMD Vivado across five versions (2020.2-2024.2). Its code and dataset are publicly available.
