Table of Contents
Fetching ...

GROOT: Graph Edge Re-growth and Partitioning for the Verification of Large Designs in Logic Synthesis

Kiran Thorat, Hongwu Peng, Yuebo Luo, Xi Xie, Shaoyi Huang, Amit Hasan, Jiahui Zhao, Yingjie Li, Zhijie Shi, Cunxi Yu, Caiwen Ding

TL;DR

GROOT tackles the scalability bottleneck of logic-synthesis verification by fusing domain-aware graph representations with a partitioned, boundary-aware GNN workflow and specialized single-GPU kernels. It redesigns node embeddings for AIGs, uses METIS-based graph partitioning with a boundary-edge re-growth mechanism to preserve inter-partition communication, and employs HD/LD CUDA kernels to accelerate processing. The approach yields substantial memory reductions and high verification accuracy on very large multipliers, with runtime gains close to or surpassing state-of-the-art GPU methods while enabling single-GPU operation. The combination of EDA insights, graph ML, and GPU-optimized kernels demonstrates strong practical potential for scalable chip verification.

Abstract

Traditional verification methods in chip design are highly time-consuming and computationally demanding, especially for large scale circuits. Graph neural networks (GNNs) have gained popularity as a potential solution to improve verification efficiency. However, there lacks a joint framework that considers all chip design domain knowledge, graph theory, and GPU kernel designs. To address this challenge, we introduce GROOT, an algorithm and system co-design framework that contains chip design domain knowledge and redesigned GPU kernels, to improve verification efficiency. More specifically, we create node features utilizing the circuit node types and the polarity of the connections between the input edges to nodes in And-Inverter Graphs (AIGs). We utilize a graph partitioning algorithm to divide the large graphs into smaller sub-graphs for fast GPU processing and develop a graph edge re-growth algorithm to recover verification accuracy. We carefully profile the EDA graph workloads and observe the uniqueness of their polarized distribution of high degree (HD) nodes and low degree (LD) nodes. We redesign two GPU kernels (HD-kernel and LD-kernel), to fit the EDA graph learning workload on a single GPU. We compare the results with state-of-the-art (SOTA) methods: GAMORA, a GNN-based approach, and the traditional ABC framework. Results show that GROOT achieves a significant reduction in memory footprint (59.38 %), with high accuracy (99.96%) for a very large CSA multiplier, i.e. 1,024 bits with a batch size of 16, which consists of 134,103,040 nodes and 268,140,544 edges. We compare GROOT with GPU-based GPU Kernel designs SOTAs such as cuSPARSE, MergePath-SpMM, and GNNAdvisor. We achieve up to 1.104x, 5.796x, and 1.469x improvement in runtime, respectively.

GROOT: Graph Edge Re-growth and Partitioning for the Verification of Large Designs in Logic Synthesis

TL;DR

GROOT tackles the scalability bottleneck of logic-synthesis verification by fusing domain-aware graph representations with a partitioned, boundary-aware GNN workflow and specialized single-GPU kernels. It redesigns node embeddings for AIGs, uses METIS-based graph partitioning with a boundary-edge re-growth mechanism to preserve inter-partition communication, and employs HD/LD CUDA kernels to accelerate processing. The approach yields substantial memory reductions and high verification accuracy on very large multipliers, with runtime gains close to or surpassing state-of-the-art GPU methods while enabling single-GPU operation. The combination of EDA insights, graph ML, and GPU-optimized kernels demonstrates strong practical potential for scalable chip verification.

Abstract

Traditional verification methods in chip design are highly time-consuming and computationally demanding, especially for large scale circuits. Graph neural networks (GNNs) have gained popularity as a potential solution to improve verification efficiency. However, there lacks a joint framework that considers all chip design domain knowledge, graph theory, and GPU kernel designs. To address this challenge, we introduce GROOT, an algorithm and system co-design framework that contains chip design domain knowledge and redesigned GPU kernels, to improve verification efficiency. More specifically, we create node features utilizing the circuit node types and the polarity of the connections between the input edges to nodes in And-Inverter Graphs (AIGs). We utilize a graph partitioning algorithm to divide the large graphs into smaller sub-graphs for fast GPU processing and develop a graph edge re-growth algorithm to recover verification accuracy. We carefully profile the EDA graph workloads and observe the uniqueness of their polarized distribution of high degree (HD) nodes and low degree (LD) nodes. We redesign two GPU kernels (HD-kernel and LD-kernel), to fit the EDA graph learning workload on a single GPU. We compare the results with state-of-the-art (SOTA) methods: GAMORA, a GNN-based approach, and the traditional ABC framework. Results show that GROOT achieves a significant reduction in memory footprint (59.38 %), with high accuracy (99.96%) for a very large CSA multiplier, i.e. 1,024 bits with a batch size of 16, which consists of 134,103,040 nodes and 268,140,544 edges. We compare GROOT with GPU-based GPU Kernel designs SOTAs such as cuSPARSE, MergePath-SpMM, and GNNAdvisor. We achieve up to 1.104x, 5.796x, and 1.469x improvement in runtime, respectively.

Paper Structure

This paper contains 14 sections, 3 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) Extremely high GPU memory requirements on large circuit graphs in EDA. CSA multiplier with different bits and batch sizes, (b) Comparison of verification methods.
  • Figure 2: Overview of the tasks: (a) Circuit to Transitional Graph Conversion, (b) Graph Input Embedding and Node Label generation based on transitional graphs, (c) Large Graph Partitions to Solve GPU memory issues, (d) Graph Neural Network Architecture, (e) Node classification.
  • Figure 3: Input to EDA graph flow: (a) Two-bit multiplier netlist, (b) AIG representation of two-bit multiplier using ABC (the dotted line represents inverted inputs to node), (c) Node features (selected nodes shown), (d) EDA graph of two-bit CSA multiplier, (e) Ground truth labels for the GNN model. (f) EDA graph embedding with node features of two-bit CSA multiplier.
  • Figure 4: GPU Kernel Design for EDA.
  • Figure 5: Detailed process of LD-kernel, from degree-sorting, row-assembling, block-partitioning, to warp-wise multiplication and summation, with block-wise parallelism.
  • ...and 5 more figures