Table of Contents
Fetching ...

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale

Ziyang Zheng, Shan Huang, Jianyuan Zhong, Zhengyuan Shi, Guohao Dai, Ningyi Xu, Qiang Xu

TL;DR

DeepGate4 tackles the scalability gap in circuit representation learning by combining a partitioned cone-based updating strategy with a GAT-based sparse transformer and structural encodings, augmented by a CUDA-based inference kernel. This design achieves sub-linear memory growth and linear-like runtime in practice, enabling training on large AIGs and inference on multi-hundred-thousand to million-gate circuits. It delivers state-of-the-art results on ITC99 and EPFL benchmarks and provides substantial efficiency gains via Fused-DeepGate4. The work demonstrates strong generalization to large-scale circuit analysis tasks, offering practical impact for scalable EDA workloads.

Abstract

Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving. However, existing models face significant challenges in scaling to large circuits due to limitations like over-squashing in graph neural networks and the quadratic complexity of transformer-based models. To address these issues, we introduce DeepGate4, a scalable and efficient graph transformer specifically designed for large-scale circuits. DeepGate4 incorporates several key innovations: (1) an update strategy tailored for circuit graphs, which reduce memory complexity to sub-linear and is adaptable to any graph transformer; (2) a GAT-based sparse transformer with global and local structural encodings for AIGs; and (3) an inference acceleration CUDA kernel that fully exploit the unique sparsity patterns of AIGs. Our extensive experiments on the ITC99 and EPFL benchmarks show that DeepGate4 significantly surpasses state-of-the-art methods, achieving 15.5% and 31.1% performance improvements over the next-best models. Furthermore, the Fused-DeepGate4 variant reduces runtime by 35.1% and memory usage by 46.8%, making it highly efficient for large-scale circuit analysis. These results demonstrate the potential of DeepGate4 to handle complex EDA tasks while offering superior scalability and efficiency. Code is available at https://github.com/zyzheng17/DeepGate4-ICLR-25.

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale

TL;DR

DeepGate4 tackles the scalability gap in circuit representation learning by combining a partitioned cone-based updating strategy with a GAT-based sparse transformer and structural encodings, augmented by a CUDA-based inference kernel. This design achieves sub-linear memory growth and linear-like runtime in practice, enabling training on large AIGs and inference on multi-hundred-thousand to million-gate circuits. It delivers state-of-the-art results on ITC99 and EPFL benchmarks and provides substantial efficiency gains via Fused-DeepGate4. The work demonstrates strong generalization to large-scale circuit analysis tasks, offering practical impact for scalable EDA workloads.

Abstract

Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving. However, existing models face significant challenges in scaling to large circuits due to limitations like over-squashing in graph neural networks and the quadratic complexity of transformer-based models. To address these issues, we introduce DeepGate4, a scalable and efficient graph transformer specifically designed for large-scale circuits. DeepGate4 incorporates several key innovations: (1) an update strategy tailored for circuit graphs, which reduce memory complexity to sub-linear and is adaptable to any graph transformer; (2) a GAT-based sparse transformer with global and local structural encodings for AIGs; and (3) an inference acceleration CUDA kernel that fully exploit the unique sparsity patterns of AIGs. Our extensive experiments on the ITC99 and EPFL benchmarks show that DeepGate4 significantly surpasses state-of-the-art methods, achieving 15.5% and 31.1% performance improvements over the next-best models. Furthermore, the Fused-DeepGate4 variant reduces runtime by 35.1% and memory usage by 46.8%, making it highly efficient for large-scale circuit analysis. These results demonstrate the potential of DeepGate4 to handle complex EDA tasks while offering superior scalability and efficiency. Code is available at https://github.com/zyzheng17/DeepGate4-ICLR-25.

Paper Structure

This paper contains 27 sections, 14 equations, 8 figures, 11 tables, 2 algorithms.

Figures (8)

  • Figure 1: The overview of DeepGate2 and DeepGate3
  • Figure 2: The overall pipeline of our method. In our training pipeline, the embedding exchanging is implemented through the following two operations: Push(GPU to CPU): After encoding a mini-batch, the online node embeddings are saved in offline historical embedding. Pull(CPU to GPU): Before encoding a mini-batch, the offline historical embeddings are used to initialize the online node embeddings in the overlap region.
  • Figure 3: Observation.
  • Figure 4: The updating process when the mini-batch size is 1.
  • Figure 5: Transformer Architecture
  • ...and 3 more figures