Graph Classification via Reference Distribution Learning: Theory and Practice

Zixiao Wang; Jicong Fan

Graph Classification via Reference Distribution Learning: Theory and Practice

Zixiao Wang, Jicong Fan

TL;DR

This work tackles graph classification by avoiding global pooling and instead treating each graph's node embeddings as a discrete distribution. It introduces Graph Reference Distribution Learning (GRDL), which learns $K$ discriminative references and classifies graphs based on MMD-based similarities between node embeddings and these references, with an end-to-end trainable Gaussian-kernel setup. The authors provide a generalization bound for GRDL, discuss how model choices (GNN depth, reference size, and kernel) influence performance, and demonstrate that GRDL can generalize better than pooling-based GNNs while achieving at least an order of magnitude faster training and inference on large-scale datasets. Empirical results on extensive benchmarks show GRDL achieving state-of-the-art or competitive accuracy with significantly reduced time costs, along with ablations validating key design choices. Overall, GRDL offers a principled, efficient, and scalable approach to graph classification with theoretical guarantees and practical impact for large graph datasets.

Abstract

Graph classification is a challenging problem owing to the difficulty in quantifying the similarity between graphs or representing graphs as vectors, though there have been a few methods using graph kernels or graph neural networks (GNNs). Graph kernels often suffer from computational costs and manual feature engineering, while GNNs commonly utilize global pooling operations, risking the loss of structural or semantic information. This work introduces Graph Reference Distribution Learning (GRDL), an efficient and accurate graph classification method. GRDL treats each graph's latent node embeddings given by GNN layers as a discrete distribution, enabling direct classification without global pooling, based on maximum mean discrepancy to adaptively learned reference distributions. To fully understand this new model (the existing theories do not apply) and guide its configuration (e.g., network architecture, references' sizes, number, and regularization) for practical use, we derive generalization error bounds for GRDL and verify them numerically. More importantly, our theoretical and numerical results both show that GRDL has a stronger generalization ability than GNNs with global pooling operations. Experiments on moderate-scale and large-scale graph datasets show the superiority of GRDL over the state-of-the-art, emphasizing its remarkable efficiency, being at least 10 times faster than leading competitors in both training and inference stages.

Graph Classification via Reference Distribution Learning: Theory and Practice

TL;DR

discriminative references and classifies graphs based on MMD-based similarities between node embeddings and these references, with an end-to-end trainable Gaussian-kernel setup. The authors provide a generalization bound for GRDL, discuss how model choices (GNN depth, reference size, and kernel) influence performance, and demonstrate that GRDL can generalize better than pooling-based GNNs while achieving at least an order of magnitude faster training and inference on large-scale datasets. Empirical results on extensive benchmarks show GRDL achieving state-of-the-art or competitive accuracy with significantly reduced time costs, along with ablations validating key design choices. Overall, GRDL offers a principled, efficient, and scalable approach to graph classification with theoretical guarantees and practical impact for large graph datasets.

Abstract

Paper Structure (48 sections, 27 theorems, 119 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 48 sections, 27 theorems, 119 equations, 6 figures, 13 tables, 1 algorithm.

Introduction
Proposed Approach
Model Framework
Design of $\mathcal{F}_G$ and $\mathcal{F}_D$
Algorithm Implementation
Theoretical Analysis
Preliminaries
Main Results
Bound Discussion and Numerical Verification
Related Work
Numerical Experiments
Graph Classification Benchmark
Datasets
Time Cost Comparison
Graph Visualization
...and 33 more sections

Key Result

Theorem 3.2

Let $n=\min_i n_i$, $c = \|\Tilde{\mathbf{A}}\|_\sigma$, and $\bar{d}=\max_{i,l}d_i^{(l)}$. Denote $R_G: =c^{2L}\|\mathbf{X}\|_2^2\ln(2\bar{d}^2) (\prod_{l=1}^L(\prod_{i=1}^r \kappa^{(l)}_i)^2) (\sum_{l=1}^L\sum_{i=1}^r(\frac{b^{(l)}_i}{\kappa^{(l)}_i})^{2/3})^3$. For graphs $\mathcal{G}=\left\{(G_i where $v_1 = \frac{64\theta K R_G\mu^2}{n}$, $v_2 = Km\bar{d}$, and $v_3 = \frac{24\sqrt{\theta N}b

Figures (6)

Figure 1: The GRDL framework. Classification involves using a GNN $f_G$ to encode a graph's information into a node embedding distribution. The similarities between the node embeddings and $K$ reference distributions are calculated by the reference module $f_D$. The graph is assigned the label of the reference that exhibits the highest similarity.
Figure 2: Average training time per epoch (second). Our GRDL is 10 times faster than OT-GNN and TFGW.
Figure 3: T-SNE visualization of MUTAG embeddings and reference distributions given by GRDL. Each dot denotes a graph and each square denotes a reference distribution.
Figure 4: Training data misclassification rate on MUTAG (left) and IMDB-BINARY (right) with different numbers of references for each class ($P$). The effect of $P$ on the training misclassification rates is not obvious.
Figure 5: The blue and orange lines denote the training error $\hat{L}_{\gamma}$ and validation error $L_{\gamma}$, respectively, of GRDL with $r \in \{2, 3, 4\}, L\in \{3, 4, 5, 6\}$
...and 1 more figures

Theorems & Definitions (51)

Example 2.1
Theorem 3.2: Generalization bound of GRDL
Theorem 3.3
Remark 3.4
Theorem A.1: Generalization bound of GIN
Theorem B.1
proof
Lemma E.1
Lemma E.2: Lemma A.5 of bartlett2017spectrally, reformulated
Definition F.1: Correctness of Classification
...and 41 more

Graph Classification via Reference Distribution Learning: Theory and Practice

TL;DR

Abstract

Graph Classification via Reference Distribution Learning: Theory and Practice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (51)