Table of Contents
Fetching ...

GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research

Xinqi Li, Yiqun Liu, Shan Jiang, Enrong Zheng, Huaijin Zheng, Wenhao Dai, Haodong Deng, Dianhai Yu, Yanjun Ma

TL;DR

GraphNet introduces a large-scale, real-world dataset of $2.7\mathrm{k}$ computational graphs spanning six task domains to enable systematic, cross-framework tensor-compiler evaluation. It proposes two metrics, the Speedup Score $S_t$ and the Error-aware Speedup Score $ES_t$, that jointly quantify runtime speedup, numerical correctness, and failure types under tunable tolerance levels, and demonstrates these metrics by benchmarking CINN and TorchInductor on CV and NLP workloads. The paper details a three-stage GraphNet pipeline (graph extraction, validation, and compiler evaluation) and a set of dataset constraints to ensure runnable, serializable, decomposable, statically analyzable graphs, with open-source tooling for extraction, validation, and evaluation. By providing a unified, reproducible platform across frameworks and backends, GraphNet aims to drive principled compiler optimization and to facilitate AI-assisted compiler research and high-level IR translation. The authors outline a roadmap to broaden framework and hardware support, refine task granularity, and extend to distributed scenarios, enhancing the dataset’s utility for the research and development of next-generation tensor compilers.

Abstract

We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .

GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research

TL;DR

GraphNet introduces a large-scale, real-world dataset of computational graphs spanning six task domains to enable systematic, cross-framework tensor-compiler evaluation. It proposes two metrics, the Speedup Score and the Error-aware Speedup Score , that jointly quantify runtime speedup, numerical correctness, and failure types under tunable tolerance levels, and demonstrates these metrics by benchmarking CINN and TorchInductor on CV and NLP workloads. The paper details a three-stage GraphNet pipeline (graph extraction, validation, and compiler evaluation) and a set of dataset constraints to ensure runnable, serializable, decomposable, statically analyzable graphs, with open-source tooling for extraction, validation, and evaluation. By providing a unified, reproducible platform across frameworks and backends, GraphNet aims to drive principled compiler optimization and to facilitate AI-assisted compiler research and high-level IR translation. The authors outline a roadmap to broaden framework and hardware support, refine task granularity, and extend to distributed scenarios, enhancing the dataset’s utility for the research and development of next-generation tensor compilers.

Abstract

We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .

Paper Structure

This paper contains 21 sections, 3 theorems, 22 equations, 6 figures, 6 tables.

Key Result

Proposition B.1

The macro-level Speedup Score $S_t$ defined in Eq. eq:St-macro-form is equivalent to the geometric mean of per-sample rectified speedups:

Figures (6)

  • Figure 1: Statistical properties of the GraphNet dataset. (a) The distribution of computational graphs across six major task categories, showing that Computer Vision (47.8%) and Natural Language Processing (39.5%) are the dominant domains. (b) and (c) Histograms showing the distribution of operator counts (on a log$_2$ scale) for CV and NLP models, respectively. Both categories show a high concentration of graphs with operator counts around $2^9$ (512).
  • Figure 2: Speedup Score $S_t$ on NVIDIA H20 for CV and NLP workloads. The vertical axis shows $S_t$, which integrates speedup, pass rate, and failure penalties into a unified score. Higher $S_t$ indicates better compiler performance under the correctness-aware speedup metric defined in Equation \ref{['eq:St-macro-form']}. The horizontal axis $t$ represents different numerical tolerance levels used for correctness checks, where larger $t$ implies more relaxed thresholds.
  • Figure 3: Error-aware Speedup Score $ES_t$ for CV and NLP workloads. The vertical axis shows $ES_t$ values, capturing compiler performance with increasing fault tolerance. Higher $ES_t$ indicates better compiler performance under the error-tolerant speedup metric defined in Equation \ref{['eq:ESt-macro-form']}. The horizontal axis $t$ represents different tolerance levels: for $t \le 0$, it reflects numerical correctness thresholds (as in $S_t$); for $t > 0$, it encodes the categories of tolerated errors: $t \geq 1$ tolerates accuracy mismatches, $t \geq 2$ tolerates runtime crashes, and $t \geq 3$ tolerates compilation failures.
  • Figure 4: GraphNet Workflow Overview. The workflow consists of three stages: (1) Graph Extraction: Traces and captures computational graphs from DL models (e.g., PaddlePaddle, PyTorch); (2) Graph Validation: Performs a consistency check via re-extraction and re-execution to ensure usability; and (3) Compiler Evaluation: Uses the validated graphs from the dataset to benchmark the runtime speedup and correctness of various compiler backends.
  • Figure 5: GraphNet Sample Composition. A user's model (left), wrapped by the @graph_net extractor, is symbolically traced to generate a standardized set of files. This set forms a complete sample, including the high-level IR of the computation graph (model.py), metadata for inputs and weights (input_meta.py, weight_meta.py), and other components such as optional custom operator code.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition B.1: Rectified Speedup
  • Proposition B.1
  • proof
  • Definition C.1
  • Proposition C.1
  • Definition C.2
  • Proposition C.2