Table of Contents
Fetching ...

CTS-Bench: Benchmarking Graph Coarsening Trade-offs for GNNs in Clock Tree Synthesis

Barsat Khadka, Kawsher Roxy, Md Rubel Ahmed

TL;DR

CTS-Bench targets the lack of public CTS benchmarks by providing 4,860 datapoints across five architectures with paired raw and clustered graphs and 15 ground-truth CTS metrics. It demonstrates that aggressive graph coarsening can dramatically reduce memory ($17.2\times$) and accelerate training ($3\times$) but often at the cost of preserving clock-skew information, with negative $R^2$ in zero-shot evaluation. The work introduces a normalized Pareto-gap metric and a reproducible OpenLane-based data-generation pipeline, enabling CTS-aware benchmarking of GNNs and accelerators under realistic constraints. It lays the groundwork for task-aware coarsening and placement–CTS feedback mechanisms to balance scalability with CTS fidelity in ML-EDA workflows.

Abstract

Graph Neural Networks (GNNs) are increasingly explored for physical design analysis in Electronic Design Automation, particularly for modeling Clock Tree Synthesis behavior such as clock skew and buffering complexity. However, practical deployment remains limited due to the prohibitive memory and runtime cost of operating on raw gate-level netlists. Graph coarsening is commonly used to improve scalability, yet its impact on CTS-critical learning objectives is not well characterized. This paper introduces CTS-Bench, a benchmark suite for systematically evaluating the trade-offs between graph coarsening, prediction accuracy, and computational efficiency in GNN-based CTS analysis. CTS-Bench consists of 4,860 converged physical design solutions spanning five architectures and provides paired raw gate-level and clustered graph representations derived from post-placement designs. Using clock skew prediction as a representative CTS task, we demonstrate a clear accuracy-efficiency trade-off. While graph coarsening reduces GPU memory usage by up to 17.2x and accelerates training by up to 3x, it also removes structural information essential for modeling clock distribution, frequently resulting in negative $R^2$ scores under zero-shot evaluation. Our findings indicate that generic graph clustering techniques can fundamentally compromise CTS learning objectives, even when global physical metrics remain unchanged. CTS-Bench enables principled evaluation of CTS-aware graph coarsening strategies, supports benchmarking of GNN architectures and accelerators under realistic physical design constraints, and provides a foundation for developing learning-assisted CTS analysis and optimization techniques.

CTS-Bench: Benchmarking Graph Coarsening Trade-offs for GNNs in Clock Tree Synthesis

TL;DR

CTS-Bench targets the lack of public CTS benchmarks by providing 4,860 datapoints across five architectures with paired raw and clustered graphs and 15 ground-truth CTS metrics. It demonstrates that aggressive graph coarsening can dramatically reduce memory () and accelerate training () but often at the cost of preserving clock-skew information, with negative in zero-shot evaluation. The work introduces a normalized Pareto-gap metric and a reproducible OpenLane-based data-generation pipeline, enabling CTS-aware benchmarking of GNNs and accelerators under realistic constraints. It lays the groundwork for task-aware coarsening and placement–CTS feedback mechanisms to balance scalability with CTS fidelity in ML-EDA workflows.

Abstract

Graph Neural Networks (GNNs) are increasingly explored for physical design analysis in Electronic Design Automation, particularly for modeling Clock Tree Synthesis behavior such as clock skew and buffering complexity. However, practical deployment remains limited due to the prohibitive memory and runtime cost of operating on raw gate-level netlists. Graph coarsening is commonly used to improve scalability, yet its impact on CTS-critical learning objectives is not well characterized. This paper introduces CTS-Bench, a benchmark suite for systematically evaluating the trade-offs between graph coarsening, prediction accuracy, and computational efficiency in GNN-based CTS analysis. CTS-Bench consists of 4,860 converged physical design solutions spanning five architectures and provides paired raw gate-level and clustered graph representations derived from post-placement designs. Using clock skew prediction as a representative CTS task, we demonstrate a clear accuracy-efficiency trade-off. While graph coarsening reduces GPU memory usage by up to 17.2x and accelerates training by up to 3x, it also removes structural information essential for modeling clock distribution, frequently resulting in negative scores under zero-shot evaluation. Our findings indicate that generic graph clustering techniques can fundamentally compromise CTS learning objectives, even when global physical metrics remain unchanged. CTS-Bench enables principled evaluation of CTS-aware graph coarsening strategies, supports benchmarking of GNN architectures and accelerators under realistic physical design constraints, and provides a foundation for developing learning-assisted CTS analysis and optimization techniques.
Paper Structure (11 sections, 2 equations, 6 figures, 3 tables)

This paper contains 11 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: CTS-Bench benchmarks the trade-off between memory efficiency and model fidelity
  • Figure 2: CTS-Bench Data generation pipeline.
  • Figure 3: Raw Nodes are nodes in ground truth graphs, and Clustered nodes are nodes after coarsening. Vertical variance shows sensitivity to placement-specific logic distribution.
  • Figure 4: Efficiency benchmark comparing Raw and Clustered models in terms of Peak VRAM and execution time.
  • Figure 5: MAE Accuracy (bars, left y-axis) and Spatial $R^2$ Fidelity (markers, right y-axis) across Skew, Power, and Wire metrics for the Raw Workload.
  • ...and 1 more figures