Table of Contents
Fetching ...

Graph Classification via Reference Distribution Learning: Theory and Practice

Zixiao Wang, Jicong Fan

TL;DR

This work tackles graph classification by avoiding global pooling and instead treating each graph's node embeddings as a discrete distribution. It introduces Graph Reference Distribution Learning (GRDL), which learns $K$ discriminative references and classifies graphs based on MMD-based similarities between node embeddings and these references, with an end-to-end trainable Gaussian-kernel setup. The authors provide a generalization bound for GRDL, discuss how model choices (GNN depth, reference size, and kernel) influence performance, and demonstrate that GRDL can generalize better than pooling-based GNNs while achieving at least an order of magnitude faster training and inference on large-scale datasets. Empirical results on extensive benchmarks show GRDL achieving state-of-the-art or competitive accuracy with significantly reduced time costs, along with ablations validating key design choices. Overall, GRDL offers a principled, efficient, and scalable approach to graph classification with theoretical guarantees and practical impact for large graph datasets.

Abstract

Graph classification is a challenging problem owing to the difficulty in quantifying the similarity between graphs or representing graphs as vectors, though there have been a few methods using graph kernels or graph neural networks (GNNs). Graph kernels often suffer from computational costs and manual feature engineering, while GNNs commonly utilize global pooling operations, risking the loss of structural or semantic information. This work introduces Graph Reference Distribution Learning (GRDL), an efficient and accurate graph classification method. GRDL treats each graph's latent node embeddings given by GNN layers as a discrete distribution, enabling direct classification without global pooling, based on maximum mean discrepancy to adaptively learned reference distributions. To fully understand this new model (the existing theories do not apply) and guide its configuration (e.g., network architecture, references' sizes, number, and regularization) for practical use, we derive generalization error bounds for GRDL and verify them numerically. More importantly, our theoretical and numerical results both show that GRDL has a stronger generalization ability than GNNs with global pooling operations. Experiments on moderate-scale and large-scale graph datasets show the superiority of GRDL over the state-of-the-art, emphasizing its remarkable efficiency, being at least 10 times faster than leading competitors in both training and inference stages.

Graph Classification via Reference Distribution Learning: Theory and Practice

TL;DR

This work tackles graph classification by avoiding global pooling and instead treating each graph's node embeddings as a discrete distribution. It introduces Graph Reference Distribution Learning (GRDL), which learns discriminative references and classifies graphs based on MMD-based similarities between node embeddings and these references, with an end-to-end trainable Gaussian-kernel setup. The authors provide a generalization bound for GRDL, discuss how model choices (GNN depth, reference size, and kernel) influence performance, and demonstrate that GRDL can generalize better than pooling-based GNNs while achieving at least an order of magnitude faster training and inference on large-scale datasets. Empirical results on extensive benchmarks show GRDL achieving state-of-the-art or competitive accuracy with significantly reduced time costs, along with ablations validating key design choices. Overall, GRDL offers a principled, efficient, and scalable approach to graph classification with theoretical guarantees and practical impact for large graph datasets.

Abstract

Graph classification is a challenging problem owing to the difficulty in quantifying the similarity between graphs or representing graphs as vectors, though there have been a few methods using graph kernels or graph neural networks (GNNs). Graph kernels often suffer from computational costs and manual feature engineering, while GNNs commonly utilize global pooling operations, risking the loss of structural or semantic information. This work introduces Graph Reference Distribution Learning (GRDL), an efficient and accurate graph classification method. GRDL treats each graph's latent node embeddings given by GNN layers as a discrete distribution, enabling direct classification without global pooling, based on maximum mean discrepancy to adaptively learned reference distributions. To fully understand this new model (the existing theories do not apply) and guide its configuration (e.g., network architecture, references' sizes, number, and regularization) for practical use, we derive generalization error bounds for GRDL and verify them numerically. More importantly, our theoretical and numerical results both show that GRDL has a stronger generalization ability than GNNs with global pooling operations. Experiments on moderate-scale and large-scale graph datasets show the superiority of GRDL over the state-of-the-art, emphasizing its remarkable efficiency, being at least 10 times faster than leading competitors in both training and inference stages.
Paper Structure (48 sections, 27 theorems, 119 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 48 sections, 27 theorems, 119 equations, 6 figures, 13 tables, 1 algorithm.

Key Result

Theorem 3.2

Let $n=\min_i n_i$, $c = \|\Tilde{\mathbf{A}}\|_\sigma$, and $\bar{d}=\max_{i,l}d_i^{(l)}$. Denote $R_G: =c^{2L}\|\mathbf{X}\|_2^2\ln(2\bar{d}^2) (\prod_{l=1}^L(\prod_{i=1}^r \kappa^{(l)}_i)^2) (\sum_{l=1}^L\sum_{i=1}^r(\frac{b^{(l)}_i}{\kappa^{(l)}_i})^{2/3})^3$. For graphs $\mathcal{G}=\left\{(G_i where $v_1 = \frac{64\theta K R_G\mu^2}{n}$, $v_2 = Km\bar{d}$, and $v_3 = \frac{24\sqrt{\theta N}b

Figures (6)

  • Figure 1: The GRDL framework. Classification involves using a GNN $f_G$ to encode a graph's information into a node embedding distribution. The similarities between the node embeddings and $K$ reference distributions are calculated by the reference module $f_D$. The graph is assigned the label of the reference that exhibits the highest similarity.
  • Figure 2: Average training time per epoch (second). Our GRDL is 10 times faster than OT-GNN and TFGW.
  • Figure 3: T-SNE visualization of MUTAG embeddings and reference distributions given by GRDL. Each dot denotes a graph and each square denotes a reference distribution.
  • Figure 4: Training data misclassification rate on MUTAG (left) and IMDB-BINARY (right) with different numbers of references for each class ($P$). The effect of $P$ on the training misclassification rates is not obvious.
  • Figure 5: The blue and orange lines denote the training error $\hat{L}_{\gamma}$ and validation error $L_{\gamma}$, respectively, of GRDL with $r \in \{2, 3, 4\}, L\in \{3, 4, 5, 6\}$
  • ...and 1 more figures

Theorems & Definitions (51)

  • Example 2.1
  • Theorem 3.2: Generalization bound of GRDL
  • Theorem 3.3
  • Remark 3.4
  • Theorem A.1: Generalization bound of GIN
  • Theorem B.1
  • proof
  • Lemma E.1
  • Lemma E.2: Lemma A.5 of bartlett2017spectrally, reformulated
  • Definition F.1: Correctness of Classification
  • ...and 41 more