Table of Contents
Fetching ...

Towards Efficient Training of Graph Neural Networks: A Multiscale Approach

Eshed Gal, Moshe Eliasof, Carola-Bibiane Schönlieb, Ivan I. Kyrchei, Eldad Haber, Eran Treister

TL;DR

The paper tackles the high computational burden of training Graph Neural Networks on large graphs by introducing a multiscale training framework that leverages graph coarsening and subgraph strategies. It presents three core mechanisms—Coarse-to-Fine training, Sub-to-Full training, and Multiscale Gradient Computation—all designed to share weights across scales and reduce expensive MP operations. A sketching-based theoretical analysis for the linear case underpins the reduced-graph approach, while extensive experiments across transductive and inductive settings on diverse datasets demonstrate substantial FLOPs reductions with comparable or improved predictive performance. The framework is shown to be broadly compatible with multiple GNN architectures and pooling methods, offering a practical path to scalable graph learning.

Abstract

Graph Neural Networks (GNNs) have become powerful tools for learning from graph-structured data, finding applications across diverse domains. However, as graph sizes and connectivity increase, standard GNN training methods face significant computational and memory challenges, limiting their scalability and efficiency. In this paper, we present a novel framework for efficient multiscale training of GNNs. Our approach leverages hierarchical graph representations and subgraphs, enabling the integration of information across multiple scales and resolutions. By utilizing coarser graph abstractions and subgraphs, each with fewer nodes and edges, we significantly reduce computational overhead during training. Building on this framework, we propose a suite of scalable training strategies, including coarse-to-fine learning, subgraph-to-full-graph transfer, and multiscale gradient computation. We also provide some theoretical analysis of our methods and demonstrate their effectiveness across various datasets and learning tasks. Our results show that multiscale training can substantially accelerate GNN training for large scale problems while maintaining, or even improving, predictive performance.

Towards Efficient Training of Graph Neural Networks: A Multiscale Approach

TL;DR

The paper tackles the high computational burden of training Graph Neural Networks on large graphs by introducing a multiscale training framework that leverages graph coarsening and subgraph strategies. It presents three core mechanisms—Coarse-to-Fine training, Sub-to-Full training, and Multiscale Gradient Computation—all designed to share weights across scales and reduce expensive MP operations. A sketching-based theoretical analysis for the linear case underpins the reduced-graph approach, while extensive experiments across transductive and inductive settings on diverse datasets demonstrate substantial FLOPs reductions with comparable or improved predictive performance. The framework is shown to be broadly compatible with multiple GNN architectures and pooling methods, offering a practical path to scalable graph learning.

Abstract

Graph Neural Networks (GNNs) have become powerful tools for learning from graph-structured data, finding applications across diverse domains. However, as graph sizes and connectivity increase, standard GNN training methods face significant computational and memory challenges, limiting their scalability and efficiency. In this paper, we present a novel framework for efficient multiscale training of GNNs. Our approach leverages hierarchical graph representations and subgraphs, enabling the integration of information across multiple scales and resolutions. By utilizing coarser graph abstractions and subgraphs, each with fewer nodes and edges, we significantly reduce computational overhead during training. Building on this framework, we propose a suite of scalable training strategies, including coarse-to-fine learning, subgraph-to-full-graph transfer, and multiscale gradient computation. We also provide some theoretical analysis of our methods and demonstrate their effectiveness across various datasets and learning tasks. Our results show that multiscale training can substantially accelerate GNN training for large scale problems while maintaining, or even improving, predictive performance.

Paper Structure

This paper contains 34 sections, 3 theorems, 23 equations, 5 figures, 28 tables, 5 algorithms.

Key Result

Theorem 4.1

Let ${\boldsymbol \theta}^*$ be the solution of equation (gnnopt_main) and let ${\boldsymbol \theta}_C^*$ be the solution of equation (gnns_opt_main). Denote $\epsilon>0$, and let ${\bf P}^{\top}$ be a sparse subspace embedding with $O(\frac{c^2}{\epsilon})$ rows, where $c$ is the size of the featur

Figures (5)

  • Figure 1: Graph coarsening examples. Left two: random pooling; Right two: Topk pooling (highest-degree nodes selected). Orange nodes indicate selected coarse nodes.
  • Figure 2: Coarse graph using subgraph pooling. Orange nodes indicate selected coarse nodes; the red node is the root for the ego-network.
  • Figure 3: Illustration of the Multiscale Gradient Computation algorithm introduced in \ref{['sec:multiscaleGradients']}.
  • Figure 4: The Q-tips data set. The data contains three types of a "q-tip" and the goal is to label the stick as type 1,2 or 3.
  • Figure 5: Comparison between losses using multiscale and single-level training. Coarse levels are generated using random coarsening.

Theorems & Definitions (3)

  • Theorem 4.1
  • Theorem A.1: woodruff2014sketching, Thm. 23
  • Theorem A.2