An Efficient Subgraph GNN with Provable Substructure Counting Power

Zuoyu Yan; Junru Zhou; Liangcai Gao; Zhi Tang; Muhan Zhang

An Efficient Subgraph GNN with Provable Substructure Counting Power

Zuoyu Yan, Junru Zhou, Liangcai Gao, Zhi Tang, Muhan Zhang

TL;DR

This work targets efficient, provable substructure counting with graph neural networks. It introduces ESC-GNN, which augments backbones with precomputed distance-based structural embeddings over rooted $k$-tuples, enabling subgraph counting without repeatedly applying GNNs to all subgraphs. Theoretical results (Theorems 3.2–3.4, 4.2–4.5) characterize counting and distinguishing power relative to the Weisfeiler-Leman hierarchy and show ESC-GNN can count a range of connected and induced substructures while remaining more scalable than full subgraph GNNs. Empirically, ESC-GNN achieves strong real-world performance on molecular and TU benchmarks, with favorable space/time trade-offs and clear ablations illustrating the benefits of global message passing and all components of the structural encoding. Overall, the approach provides a practical, provably expressive model for substructure-aware graph learning with reduced computational cost.

Abstract

We investigate the enhancement of graph neural networks' (GNNs) representation power through their ability in substructure counting. Recent advances have seen the adoption of subgraph GNNs, which partition an input graph into numerous subgraphs, subsequently applying GNNs to each to augment the graph's overall representation. Despite their ability to identify various substructures, subgraph GNNs are hindered by significant computational and memory costs. In this paper, we tackle a critical question: Is it possible for GNNs to count substructures both \textbf{efficiently} and \textbf{provably}? Our approach begins with a theoretical demonstration that the distance to rooted nodes in subgraphs is key to boosting the counting power of subgraph GNNs. To avoid the need for repetitively applying GNN across all subgraphs, we introduce precomputed structural embeddings that encapsulate this crucial distance information. Experiments validate that our proposed model retains the counting power of subgraph GNNs while achieving significantly faster performance.

An Efficient Subgraph GNN with Provable Substructure Counting Power

TL;DR

-tuples, enabling subgraph counting without repeatedly applying GNNs to all subgraphs. Theoretical results (Theorems 3.2–3.4, 4.2–4.5) characterize counting and distinguishing power relative to the Weisfeiler-Leman hierarchy and show ESC-GNN can count a range of connected and induced substructures while remaining more scalable than full subgraph GNNs. Empirically, ESC-GNN achieves strong real-world performance on molecular and TU benchmarks, with favorable space/time trade-offs and clear ablations illustrating the benefits of global message passing and all components of the structural encoding. Overall, the approach provides a practical, provably expressive model for substructure-aware graph learning with reduced computational cost.

Abstract

Paper Structure (15 sections, 10 theorems, 7 equations, 6 figures, 7 tables)

This paper contains 15 sections, 10 theorems, 7 equations, 6 figures, 7 tables.

The Weisfeiler-Leman algorithm and Message Passing Neural Network
The Proof of Theorem 3.2
The Proof of Theorem 3.4
The Proof of Theorem 4.4
The Proof of Theorem 4.2
The Proof of Theorem 4.3
The Proof of Theorem 4.5
Additional Discussion on the Expressive Power of ESC-GNN
Experimental Details
Evaluation on Real-World Datasets
Results
Ablation Study
Evaluation on the Space Cost
Counting Substructures on Other Datasets
Limitations and the Assets We Used

Key Result

Theorem 2.1

For any $k\geq 2$, there exists a pair of graphs $G$ and $H$, such that $G$ contains a $(k+1)$-clique as its subgraph while $H$ does not, and that $k$-WL can't distinguish $G$ from $H$.

Figures (6)

Figure 1: (a) the 4*4 Rook Graph and (b) the Shrikhande Graph
Figure 2: Examples of 4-cycles that pass the rooted edges. In these figures, the rooted 2-tuples are colored blue.
Figure 3: Examples where ESC-GNN cannot subgraph-count 5-cycles. In these figures, the rooted 2-tuples are colored blue.
Figure 4: Examples of stars that pass the rooted edges. In these figures, the rooted 2-tuples are colored blue.
Figure 5: Examples where ESC-GNN cannot induced-subgraph-count 5-stars. In these figures, the rooted 2-tuples are colored blue.
...and 1 more figures

Theorems & Definitions (17)

Theorem 2.1
proof
Lemma 2.2
Lemma 2.3
Lemma 2.4
Remark 2.5
Theorem 3.1
proof
Theorem 4.1
proof
...and 7 more

An Efficient Subgraph GNN with Provable Substructure Counting Power

TL;DR

Abstract

An Efficient Subgraph GNN with Provable Substructure Counting Power

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (17)