Graph Sparsification via Mixture of Graphs

Guibin Zhang; Xiangguo Sun; Yanwei Yue; Chonghe Jiang; Kun Wang; Tianlong Chen; Shirui Pan

Graph Sparsification via Mixture of Graphs

Guibin Zhang, Xiangguo Sun, Yanwei Yue, Chonghe Jiang, Kun Wang, Tianlong Chen, Shirui Pan

TL;DR

This paper introduces Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node, and incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria.

Abstract

Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node. Subsequently, MoG performs a mixture of the sparse graphs produced by different experts on the Grassmann manifold to derive an optimal sparse graph. One notable property of MoG is its entirely local nature, as it depends on the specific circumstances of each individual node. Extensive experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNN backbones, demonstrate that MoG (I) identifies subgraphs at higher sparsity levels ($8.67\%\sim 50.85\%$), with performance equal to or better than the dense graph, (II) achieves $1.47-2.62\times$ speedup in GNN inference with negligible performance drop, and (III) boosts ``top-student'' GNN performance ($1.02\%\uparrow$ on RevGNN+\textsc{ogbn-proteins} and $1.74\%\uparrow$ on DeeperGCN+\textsc{ogbg-ppa}).

Graph Sparsification via Mixture of Graphs

TL;DR

Abstract

), with performance equal to or better than the dense graph, (II) achieves

speedup in GNN inference with negligible performance drop, and (III) boosts ``top-student'' GNN performance (

on RevGNN+\textsc{ogbn-proteins} and

on DeeperGCN+\textsc{ogbg-ppa}).

Paper Structure (51 sections, 30 equations, 3 figures, 16 tables, 1 algorithm)

This paper contains 51 sections, 30 equations, 3 figures, 16 tables, 1 algorithm.

Introduction
Technical Backgound
Notations & Problem Formulation
Graph Neural Networks
Graph Sparsification
Mixture of Experts
Methodology
Overview
Routing to Diverse Experts
Customized Sparsifier Modeling
Graph Mixture on Grassmann Manifold
Training and Optimization
Additional Loss Functions
Complexity Analysis
Experiments
...and 36 more sections

Figures (3)

Figure 1: (Left) We illustrated the $k$-hop neighborhood expansion rates for nodes 6 and 14, which is proportional to the amount of message they receive as the GNN layers deepen; (Middle) The local patterns of different nodes vary, hence the attributions of edge pruning may also differ. For instance, pruning $(v_1,v_2)$ might be due to its non-bridge identity, while pruning $(v_5,v_6)$ could be attributed to it non-homophilic nature; (Right) The overview of our proposed MoG.
Figure 2: The overview of our proposed method. MoG primarily comprises ego-graph decomposition, expert routing, sparsifier customization, and the final graph mixture. For simplicity, we only showcase three pruning criteria including Jaccard similarity, gradient magnitude, and effective resistance.
Figure 3: The trade-off between inference speedup and model performance for MoG and other sparsifiers. The first and second rows represent results on GraphSAGE and DeeperGCN, respectively. The gray pentagon represents the performance of the original GNN without sparsification.

Theorems & Definitions (1)

Definition 1: Grassmann manifold

Graph Sparsification via Mixture of Graphs

TL;DR

Abstract

Graph Sparsification via Mixture of Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)