Towards Efficient Training of Graph Neural Networks: A Multiscale Approach
Eshed Gal, Moshe Eliasof, Carola-Bibiane Schönlieb, Ivan I. Kyrchei, Eldad Haber, Eran Treister
TL;DR
The paper tackles the high computational burden of training Graph Neural Networks on large graphs by introducing a multiscale training framework that leverages graph coarsening and subgraph strategies. It presents three core mechanisms—Coarse-to-Fine training, Sub-to-Full training, and Multiscale Gradient Computation—all designed to share weights across scales and reduce expensive MP operations. A sketching-based theoretical analysis for the linear case underpins the reduced-graph approach, while extensive experiments across transductive and inductive settings on diverse datasets demonstrate substantial FLOPs reductions with comparable or improved predictive performance. The framework is shown to be broadly compatible with multiple GNN architectures and pooling methods, offering a practical path to scalable graph learning.
Abstract
Graph Neural Networks (GNNs) have become powerful tools for learning from graph-structured data, finding applications across diverse domains. However, as graph sizes and connectivity increase, standard GNN training methods face significant computational and memory challenges, limiting their scalability and efficiency. In this paper, we present a novel framework for efficient multiscale training of GNNs. Our approach leverages hierarchical graph representations and subgraphs, enabling the integration of information across multiple scales and resolutions. By utilizing coarser graph abstractions and subgraphs, each with fewer nodes and edges, we significantly reduce computational overhead during training. Building on this framework, we propose a suite of scalable training strategies, including coarse-to-fine learning, subgraph-to-full-graph transfer, and multiscale gradient computation. We also provide some theoretical analysis of our methods and demonstrate their effectiveness across various datasets and learning tasks. Our results show that multiscale training can substantially accelerate GNN training for large scale problems while maintaining, or even improving, predictive performance.
