Table of Contents
Fetching ...

Fast and Effective GNN Training through Sequences of Random Path Graphs

Francesco Bonchi, Claudio Gentile, Francesco Paolo Nerini, André Panisson, Fabio Vitale

TL;DR

This work tackles the scalability and generalization challenges of graph neural networks for node classification, especially under scarce labeling. It introduces Gern, a training framework that progressively refines GNN weights over a sequence of ultra-sparse Random Path Graphs (RPGs) derived from Random Spanning Trees (RSTs) and weighted by effective resistance, enabling fast training while preserving critical topology. By operating on RPGs and leveraging parallelized Approximate RSTs (A-RSTs), Gern mitigates over-squashing and over-smoothing and demonstrates improved test accuracy in small-data regimes across multiple benchmarks. Empirical results show substantial speedups and competitive or superior accuracy compared to baselines, with ablations validating the RPG-based linearization and the beneficial effect of ensemble RPGs on regularization and information propagation.

Abstract

We present GERN, a novel scalable framework for training GNNs in node classification tasks, based on effective resistance, a standard tool in spectral graph theory. Our method progressively refines the GNN weights on a sequence of random spanning trees suitably transformed into path graphs which, despite their simplicity, are shown to retain essential topological and node information of the original input graph. The sparse nature of these path graphs substantially lightens the computational burden of GNN training. This not only enhances scalability but also improves accuracy in subsequent test phases, especially under small training set regimes, which are of great practical importance, as in many real-world scenarios labels may be hard to obtain. In these settings, our framework yields very good results as it effectively counters the training deterioration caused by overfitting when the training set is small. Our method also addresses common issues like over-squashing and over-smoothing while avoiding under-reaching phenomena. Although our framework is flexible and can be deployed in several types of GNNs, in this paper we focus on graph convolutional networks and carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we achieve simultaneous improvement of training speed and test accuracy over a wide pool of representative baselines.

Fast and Effective GNN Training through Sequences of Random Path Graphs

TL;DR

This work tackles the scalability and generalization challenges of graph neural networks for node classification, especially under scarce labeling. It introduces Gern, a training framework that progressively refines GNN weights over a sequence of ultra-sparse Random Path Graphs (RPGs) derived from Random Spanning Trees (RSTs) and weighted by effective resistance, enabling fast training while preserving critical topology. By operating on RPGs and leveraging parallelized Approximate RSTs (A-RSTs), Gern mitigates over-squashing and over-smoothing and demonstrates improved test accuracy in small-data regimes across multiple benchmarks. Empirical results show substantial speedups and competitive or superior accuracy compared to baselines, with ablations validating the RPG-based linearization and the beneficial effect of ensemble RPGs on regularization and information propagation.

Abstract

We present GERN, a novel scalable framework for training GNNs in node classification tasks, based on effective resistance, a standard tool in spectral graph theory. Our method progressively refines the GNN weights on a sequence of random spanning trees suitably transformed into path graphs which, despite their simplicity, are shown to retain essential topological and node information of the original input graph. The sparse nature of these path graphs substantially lightens the computational burden of GNN training. This not only enhances scalability but also improves accuracy in subsequent test phases, especially under small training set regimes, which are of great practical importance, as in many real-world scenarios labels may be hard to obtain. In these settings, our framework yields very good results as it effectively counters the training deterioration caused by overfitting when the training set is small. Our method also addresses common issues like over-squashing and over-smoothing while avoiding under-reaching phenomena. Although our framework is flexible and can be deployed in several types of GNNs, in this paper we focus on graph convolutional networks and carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we achieve simultaneous improvement of training speed and test accuracy over a wide pool of representative baselines.
Paper Structure (15 sections, 2 theorems, 6 equations, 7 figures, 13 tables, 1 algorithm)

This paper contains 15 sections, 2 theorems, 6 equations, 7 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Let $G = (V,E)$ be any undirected and unweighted graph, with $n = |V|$. Then for all $K \le n$ there exists a randomized labeling $y$ of $G$ such that for all (deterministic or randomized) algorithms $A$, the expected number of prediction mistakes made by $A$ in the above sequential node classificat

Figures (7)

  • Figure 1: (Left) Input graph $G$ with $n=9$ nodes, each one belonging to one of $c=3$ possible classes (yellow, orange, and light blue). This example emphasizes homophily: $G$ can be partitioned into 3 uniformly-colored cliques. (Center) An RST (thick edges) $T$ of $G$, along with a depth-first visit of $T$, starting from the top-left orange node. The numbers indicate the visit order. (Right) An RPG of $G$ computed from the spanning tree $T$. The linearization in this case produces a path graph that can be split into stretches of uniformly labeled/colored nodes.
  • Figure 2: Over-squashing. The input graph (a) and the typical bottleneck caused by message passing over a large number of distant nodes (b). Instead (c), the message passing on a path graph involves far less nodes which are distant from the one at hand.
  • Figure 4: Over-smoothing and over-squashing metrics (see main text) against number of layers for GCNs on the Cora and OGBN-arXiv datasets (the higher the better). The number of hidden channels is set to 128. In the case of the over-smoothing, the RSTs and RPGs lines largely overlap.
  • Figure 5: Validation loss trends when training with 20 nodes per class, OGBN-Arxiv dataset.
  • Figure 6: Learning curves, from first to last row: train accuracy, test accuracy, train loss, test loss. Red is GERN-GCN, green is GCN. The number of hidden channels is set to 256. Curves are the average over 10 runs.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2