Table of Contents
Fetching ...

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

Hongyuan Zhang, Yanan Zhu, Xuelong Li

TL;DR

A backward training mechanism is developed that makes the former modules perceive the latter modules, inspired by the classical backward propagation algorithm, to avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones.

Abstract

Graph neural networks (GNN) suffer from severe inefficiency. It is mainly caused by the exponential growth of node dependency with the increase of layers. It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming. To address this problem, we propose to decouple a multi-layer GNN as multiple simple modules for more efficient training, which is comprised of classical forward training (FT)and designed backward training (BT). Under the proposed framework, each module can be trained efficiently in FT by stochastic algorithms without distortion of graph information owing to its simplicity. To avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones, we develop a backward training mechanism that makes the former modules perceive the latter modules. The backward training introduces the reversed information delivery into the decoupled modules as well as the forward information delivery. To investigate how the decoupling and greedy training affect the representational capacity, we theoretically prove that the error produced by linear modules will not accumulate on unsupervised tasks in most cases. The theoretical and experimental results show that the proposed framework is highly efficient with reasonable performance.

Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One

TL;DR

A backward training mechanism is developed that makes the former modules perceive the latter modules, inspired by the classical backward propagation algorithm, to avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones.

Abstract

Graph neural networks (GNN) suffer from severe inefficiency. It is mainly caused by the exponential growth of node dependency with the increase of layers. It extremely limits the application of stochastic optimization algorithms so that the training of GNN is usually time-consuming. To address this problem, we propose to decouple a multi-layer GNN as multiple simple modules for more efficient training, which is comprised of classical forward training (FT)and designed backward training (BT). Under the proposed framework, each module can be trained efficiently in FT by stochastic algorithms without distortion of graph information owing to its simplicity. To avoid the only unidirectional information delivery of FT and sufficiently train shallow modules with the deeper ones, we develop a backward training mechanism that makes the former modules perceive the latter modules. The backward training introduces the reversed information delivery into the decoupled modules as well as the forward information delivery. To investigate how the decoupling and greedy training affect the representational capacity, we theoretically prove that the error produced by linear modules will not accumulate on unsupervised tasks in most cases. The theoretical and experimental results show that the proposed framework is highly efficient with reasonable performance.
Paper Structure (26 sections, 5 theorems, 51 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 5 theorems, 51 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\delta = 1- \cos(\theta_*/2)$ and $\theta_* = \theta ((\bm P - \bm X \bm X^T) \bm Q, \bm Q (\bm P - \bm X \bm X^T))$ where $\bm Q = \bm U_o \bm U_o^T - \bm I / 2$ and $o = \min({\rm rank}(\bm X), k)$. Under Assumption assumption_commute, if $\|\bm P - \bm H \bm H^T\| = \varepsilon \leq \mathcal

Figures (8)

  • Figure 1: Illustration of a stacked graph neural network decoupled from an $L$-layer GNN. To train each module individually, some loss (e.g., semi-supervised loss, unsupervised loss, contrastive loss) is required and it is denoted by $\mathcal{L}_{FT}^{(t)}$. To let the shallow modules perceive the deeper ones, $\mathcal{M}_t$ passes back the expected input features to $\mathcal{M}_{t-1}$ during the backward training. The divergence between the features output by $\mathcal{M}_t$ and the expected features of $\mathcal{M}_{t-1}$ formulates the BT loss $\mathcal{L}_{BT}^{(t)}$.
  • Figure 2: Performance and training efficiency of several scalable GNNs. The efficiency metric is computed by "Consuming Time / # Iterations". The consuming time begins from loading data into RAM. The first line shows the result of node clustering and the second line is the result of node classification. Note that SGC is slower than SGNN since SGNN updates parameters more times than SGC per graph preprocessing owing to the design of BT.
  • Figure 3: Visualization of SGNN comprised of 3 modules and 3-layer GCN on node classification. For SGNN, the output of $\mathcal{M}_3$ is visualized. For GCN, the output of the final GCN-layer is visualized.
  • Figure 4: Visualization of a trained SGNN comprised of 3 modules on node classification of Cora and Citeseer. The first line shows the visualization of Cora and the bottom line shows the visualization of Citeseer.
  • Figure 5: Visualization of a trained SGNN comprised of 3 modules on node clustering of Cora and Citeseer. The first line is visualization on Cora and the second line is visualization on Citeseer.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Definition 3.1
  • Definition 4.1
  • Theorem 4.1
  • Theorem 4.2
  • Corollary 4.1
  • Lemma 7.1
  • proof
  • proof
  • proof
  • Corollary 7.1
  • ...and 1 more