Table of Contents
Fetching ...

Learning on Large Graphs using Intersecting Communities

Ben Finkelshtein, İsmail İlkan Ceylan, Michael Bronstein, Ron Levie

TL;DR

This work tackles the memory bottleneck of graph neural networks on large graphs by introducing intersecting community graphs (ICG) to approximate input graphs with a fixed, low-rank structure. It proves a constructive, semi-regularity result that guarantees small cut-metric error when approximating a graph by a low-rank ICG by minimizing Frobenius error, yielding a community count $K=O( ext{poly}(1/\epsilon))$ independent of graph size for dense graphs. Learning then proceeds in two stages: offline ICG fitting with gradient-based methods (and Subgraph SGD for scalability) and online learning on the ICG plus node signals using novel ICG-NN architectures that operate in $O(N)$ time per layer, in contrast to MP-GNNs' $O(E)$. Empirically, the approach delivers competitive or state-of-the-art performance on node classification and spatio-temporal tasks while offering substantial runtime and memory benefits, especially on very large graphs. The framework also supports efficient learning on dynamic graphs through the spatio-temporal extension, making it promising for real-world dense networks where traditional GNNs struggle with memory constraints.

Abstract

Message Passing Neural Networks (MPNNs) are a staple of graph machine learning. MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors, which necessitates a memory complexity of the order of the number of graph edges. This complexity might quickly become prohibitive for large graphs provided they are not very sparse. In this paper, we propose a novel approach to alleviate this problem by approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques. The key insight is that the number of communities required to approximate a graph does not depend on the graph size. We develop a new constructive version of the Weak Graph Regularity Lemma to efficiently construct an approximating ICG for any input graph. We then devise an efficient graph learning algorithm operating directly on ICG in linear memory and time with respect to the number of nodes (rather than edges). This offers a new and fundamentally different pipeline for learning on very large non-sparse graphs, whose applicability is demonstrated empirically on node classification tasks and spatio-temporal data processing.

Learning on Large Graphs using Intersecting Communities

TL;DR

This work tackles the memory bottleneck of graph neural networks on large graphs by introducing intersecting community graphs (ICG) to approximate input graphs with a fixed, low-rank structure. It proves a constructive, semi-regularity result that guarantees small cut-metric error when approximating a graph by a low-rank ICG by minimizing Frobenius error, yielding a community count independent of graph size for dense graphs. Learning then proceeds in two stages: offline ICG fitting with gradient-based methods (and Subgraph SGD for scalability) and online learning on the ICG plus node signals using novel ICG-NN architectures that operate in time per layer, in contrast to MP-GNNs' . Empirically, the approach delivers competitive or state-of-the-art performance on node classification and spatio-temporal tasks while offering substantial runtime and memory benefits, especially on very large graphs. The framework also supports efficient learning on dynamic graphs through the spatio-temporal extension, making it promising for real-world dense networks where traditional GNNs struggle with memory constraints.

Abstract

Message Passing Neural Networks (MPNNs) are a staple of graph machine learning. MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors, which necessitates a memory complexity of the order of the number of graph edges. This complexity might quickly become prohibitive for large graphs provided they are not very sparse. In this paper, we propose a novel approach to alleviate this problem by approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques. The key insight is that the number of communities required to approximate a graph does not depend on the graph size. We develop a new constructive version of the Weak Graph Regularity Lemma to efficiently construct an approximating ICG for any input graph. We then devise an efficient graph learning algorithm operating directly on ICG in linear memory and time with respect to the number of nodes (rather than edges). This offers a new and fundamentally different pipeline for learning on very large non-sparse graphs, whose applicability is demonstrated empirically on node classification tasks and spatio-temporal data processing.
Paper Structure (85 sections, 13 theorems, 95 equations, 8 figures, 12 tables)

This paper contains 85 sections, 13 theorems, 95 equations, 8 figures, 12 tables.

Key Result

Theorem 3.1

Let $({\bm{A}},{\bm{S}})$ be a $D$-channel graph-signal of $N$ nodes, where ${\mathrm deg}({\bm{A}})=E'$. Let $K\in{\mathbb{N}}$, $\delta>0$, and ${\mathcal{Q}}$ be a soft affiliation model. Consider the matrix-signal cut norm with weights $\alpha,\beta\geq 0$ not both zero, and the matrix-signal Fr

Figures (8)

  • Figure 1: Top: adjacency matrix of a simple graph. Bottom: approximating 5 community ICG.
  • Figure 2: Runtime of K-ICG$_{\mathrm u}$-NN (for K=100) as a function of GCN forward pass duration on graphs $G \sim \mathrm{ER}(n, p(n)=0.5)$.
  • Figure 3: ROC AUC of ICG$_{\mathrm u}$-NN and an MLP as a function of the $\%$ nodes removed from the graph.
  • Figure 4: Test ROC AUC of tolokers (left) and test accuracy of squirrel (right) as a function of the number of communities.
  • Figure 5: Cut-norm as a function of Frobenius norm on the tolokers (left) and squirrel (right) datasets. The number of communities used is indicated next to each point.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Definition 2.1
  • Definition 3.1
  • Definition 3.2
  • Theorem 3.1
  • Proposition 4.1
  • Theorem A.1: Weak Regularity Lemma weakReg
  • Definition B.1
  • Definition B.2
  • Definition B.3
  • Theorem B.1
  • ...and 14 more