Table of Contents
Fetching ...

Nonparametric Teaching for Graph Property Learners

Chen Zhang, Weixin Bu, Zeyi Ren, Zhengwu Liu, Yik-Chung Wu, Ngai Wong

TL;DR

Graph property learners incur high training costs due to learning the implicit mapping $f^*:\mathbb{G}\to\mathcal{Y}$ from graphs to properties. GraNT reframes this as nonparametric teaching, showing that parameter-space updates for a graph neural network induce a functional gradient flow in an RKHS, with a dynamic Graph Neural Tangent Kernel $K_{\theta^t}$ that converges to the structure-aware canonical kernel $K$. The paper provides theory linking structure-aware parameter updates to the nonparametric teaching paradigm, and introduces a greedy GraNT algorithm that selects graphs with large gradient impact to accelerate convergence. Empirically, GraNT achieves substantial training-time savings (e.g., reductions of $-36.62\%$ to $-47.30\%$ across graph- and node-level tasks) while preserving or improving generalization, across multiple graph datasets and architectures. This work broadens nonparametric teaching to graph-structured data and offers a practical path to faster graph-property learning in domains like chemistry and biology.

Abstract

Inferring properties of graph-structured data, e.g., the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Neural Teaching (GraNT) that reinterprets the learning process through a novel nonparametric teaching perspective. Specifically, the latter offers a theoretical framework for teaching implicitly defined (i.e., nonparametric) mappings via example selection. Such an implicit mapping is realized by a dense set of graph-property pairs, with the GraNT teacher selecting a subset of them to promote faster convergence in GCN training. By analytically examining the impact of graph structure on parameter-based gradient descent during training, and recasting the evolution of GCNs--shaped by parameter updates--through functional gradient descent in nonparametric teaching, we show for the first time that teaching graph property learners (i.e., GCNs) is consistent with teaching structure-aware nonparametric learners. These new findings readily commit GraNT to enhancing learning efficiency of the graph property learner, showing significant reductions in training time for graph-level regression (-36.62%), graph-level classification (-38.19%), node-level regression (-30.97%) and node-level classification (-47.30%), all while maintaining its generalization performance.

Nonparametric Teaching for Graph Property Learners

TL;DR

Graph property learners incur high training costs due to learning the implicit mapping from graphs to properties. GraNT reframes this as nonparametric teaching, showing that parameter-space updates for a graph neural network induce a functional gradient flow in an RKHS, with a dynamic Graph Neural Tangent Kernel that converges to the structure-aware canonical kernel . The paper provides theory linking structure-aware parameter updates to the nonparametric teaching paradigm, and introduces a greedy GraNT algorithm that selects graphs with large gradient impact to accelerate convergence. Empirically, GraNT achieves substantial training-time savings (e.g., reductions of to across graph- and node-level tasks) while preserving or improving generalization, across multiple graph datasets and architectures. This work broadens nonparametric teaching to graph-structured data and offers a practical path to faster graph-property learning in domains like chemistry and biology.

Abstract

Inferring properties of graph-structured data, e.g., the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Neural Teaching (GraNT) that reinterprets the learning process through a novel nonparametric teaching perspective. Specifically, the latter offers a theoretical framework for teaching implicitly defined (i.e., nonparametric) mappings via example selection. Such an implicit mapping is realized by a dense set of graph-property pairs, with the GraNT teacher selecting a subset of them to promote faster convergence in GCN training. By analytically examining the impact of graph structure on parameter-based gradient descent during training, and recasting the evolution of GCNs--shaped by parameter updates--through functional gradient descent in nonparametric teaching, we show for the first time that teaching graph property learners (i.e., GCNs) is consistent with teaching structure-aware nonparametric learners. These new findings readily commit GraNT to enhancing learning efficiency of the graph property learner, showing significant reductions in training time for graph-level regression (-36.62%), graph-level classification (-38.19%), node-level regression (-30.97%) and node-level classification (-47.30%), all while maintaining its generalization performance.

Paper Structure

This paper contains 18 sections, 5 theorems, 51 equations, 17 figures, 7 tables, 1 algorithm.

Key Result

Lemma 3

(Chain rule for functional gradients) For differentiable functions $G(F): \mathbb{R}\mapsto\mathbb{R}$ that depend on functionals $F(f):\mathcal{H}\mapsto\mathbb{R}$, the expression is typically referred to as the chain rule.

Figures (17)

  • Figure 1: An illustration of the implicit mapping $f^*$ between a graph $\bm{G}$ and its property $f^*(\bm{G})$, where $f^0$ denotes the mapping of the initial graph property learner, e.g., an initialized GCN.
  • Figure 2: A workflow illustration of a two-layer flexible GCN with a four-node graph $\bm{G}$ as input.
  • Figure 3: Validation set performance for graph-level tasks: ZINC (regression) and ogbg-molhiv (classification).
  • Figure 4: Validation set performance for node-level tasks: gen-reg (regression) and gen-cls (classification).
  • Figure 5: Graphical illustration of GNTK computation: $K_{\theta}(\bm{G}_{(3)},\bm{G}'_{(4)})=\left\langle\frac{\partial f_{\theta}(\bm{G})}{\partial \theta},\frac{\partial f_{\theta}(\bm{G}')}{\partial \theta} \right\rangle=\frac{\partial f_{\theta}(\bm{G})}{\partial \bm{W}^{(1)}_{(1,1)}}\frac{\partial f_{\theta}(\bm{G}')}{\partial \bm{W}^{(1)}_{(1,1)}}+\cdots+\frac{\partial f_{\theta}(\bm{G})}{\partial \bm{W}^{(1)}_{(\kappa_1d,h_1)}}\frac{\partial f_{\theta}(\bm{G}')}{\partial \bm{W}^{(1)}_{(\kappa_1d,h_1)}}+\frac{\partial f_{\theta}(\bm{G})}{\partial \bm{W}^{(2)}_{(1)}}\frac{\partial f_{\theta}(\bm{G}')}{\partial \bm{W}^{(2)}_{(1)}}+\cdots+\frac{\partial f_{\theta}(\bm{G})}{\partial \bm{W}^{(2)}_{(\kappa_2h_1)}}\frac{\partial f_{\theta}(\bm{G}')}{\partial \bm{W}^{(2)}_{(\kappa_2h_1)}}$.
  • ...and 12 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Lemma 3
  • Lemma 4
  • Theorem 5
  • Proposition 6
  • Lemma 7