MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

Haitian Jiang; Renjie Liu; Zengfeng Huang; Yichuan Wang; Xiao Yan; Zhenkun Cai; Minjie Wang; David Wipf

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

Haitian Jiang, Renjie Liu, Zengfeng Huang, Yichuan Wang, Xiao Yan, Zhenkun Cai, Minjie Wang, David Wipf

TL;DR

MuseGNN tackles scaling of energy-based unfolded GNNs by embedding a sampling-based graph-regularized energy into the learning objective. It defines an offline-subgraph energy $\ ell_{\\text{muse}}(\\mathbb{Y}, M)$ and optimizes via alternating minimization over subgraph embeddings and shared node summaries, with an online mean estimator linking subgraphs. The authors establish convergence guarantees for both the upper-level training and the lower-level energy descent under specific settings and demonstrate stability and competitive accuracy on extremely large homogeneous graphs, including benchmarks exceeding 1 TB. This approach delivers scalable, interpretable GNN layers that retain expressive power and competitive performance without prohibitive memory requirements.

Abstract

Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit transparent, exploitable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size. Our source code is available at https://github.com/haitian-jiang/MuseGNN.

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

TL;DR

MuseGNN tackles scaling of energy-based unfolded GNNs by embedding a sampling-based graph-regularized energy into the learning objective. It defines an offline-subgraph energy

and optimizes via alternating minimization over subgraph embeddings and shared node summaries, with an online mean estimator linking subgraphs. The authors establish convergence guarantees for both the upper-level training and the lower-level energy descent under specific settings and demonstrate stability and competitive accuracy on extremely large homogeneous graphs, including benchmarks exceeding 1 TB. This approach delivers scalable, interpretable GNN layers that retain expressive power and competitive performance without prohibitive memory requirements.

Abstract

Paper Structure (45 sections, 10 theorems, 31 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 45 sections, 10 theorems, 31 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Background and Motivation
GNN Architectures from Unfolded Optimization
Notation.
GNN Basics.
Moving to Unfolded GNNs.
Why Unfolded GNNs?
Scalability Challenges and Candidate Solutions
Graph-Regularized Energy Functions Infused with Sampling
Offline Sampling Foundation
Energy Function Formulation
Notable Limiting Cases
From Sampling-based Energies to the MuseGNN Framework
Convergence Analysis of MuseGNN
Global Convergence with ${\hbox{\boldmath $\gamma$}}$ = 0.
...and 30 more sections

Key Result

Proposition 3.1

Suppose we have $m$ subgraphs $(\mathcal{V}_1,\mathcal{E}_1),\ldots,(\mathcal{V}_m,\mathcal{E}_m)$ constructed independently such that $\forall s=1,\ldots,m, \forall u,v\in \mathcal{V}, \Pr[v\in\mathcal{V}_s]=\Pr[v\in\mathcal{V}_s\mid u\in\mathcal{V}_s]=p; (i,j)\in\mathcal{E}_s \iff i\in\mathcal{V}_

Figures (5)

Figure 1: MuseGNN vs. existing methods on largest graphs (LG), where 'top acc.' refers to top LG accuracy. Note that the convergence guarantee and greater energy expressivity are specifically defined w.r.t. UGNN models, hence 'N/A' for non-UGNNs without a lower-level energy.
Figure 2: Building on analysis from expressiveness, it is possible to achieve increased expressiveness via energy functions based on sampled subgraphs (as incorporated by MuseGNN).
Figure 3: Convergence of the upper-level loss on ogbn-arxiv dataset for 20 epochs with penalty factor $\gamma=1$.
Figure : MuseGNN Training Procedure
Figure : Training speed (epoch time) in seconds; hardware configurations in \ref{['sec:exp-detail']}.

Theorems & Definitions (21)

Proposition 3.1
Definition 5.1
Theorem 5.2
Theorem 5.3
Proposition C.1
proof
proof
Lemma E.1
proof
Lemma E.2
...and 11 more

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

TL;DR

Abstract

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (21)