Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

Guibin Zhang; Yanwei Yue; Kun Wang; Junfeng Fang; Yongduo Sui; Kai Wang; Yuxuan Liang; Dawei Cheng; Shirui Pan; Tianlong Chen

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

Guibin Zhang, Yanwei Yue, Kun Wang, Junfeng Fang, Yongduo Sui, Kai Wang, Yuxuan Liang, Dawei Cheng, Shirui Pan, Tianlong Chen

TL;DR

GST introduces Graph Sparse Training to address GNN sparsification by maintaining topological integrity and semantic consistency through an anchor graph and dynamic prune/regrow under the Equilibria Sparsification Principle. It blends topology-guided and semantic-guided sparsification, enabling high sparsity with negligible performance loss, spectral preservation, and substantial GNN inference speedups on six datasets with five backbones. The framework constructs an anchor from full-graph training, then continuously refines a sparse graph via joint edge-dropping and regrowth guided by eigenvalue-based topology and KL-divergence-based semantics. Empirically, GST exhibits robustness to adversarial edge perturbations and improves graph lottery ticket discovery, demonstrating wide applicability and potential for on-device GNN deployment.

Abstract

Graph Neural Networks (GNNs) excel in various graph learning tasks but face computational challenges when applied to large-scale graphs. A promising solution is to remove non-essential edges to reduce the computational overheads in GNN. Previous literature generally falls into two categories: topology-guided and semantic-guided. The former maintains certain graph topological properties yet often underperforms on GNNs due to low integration with neural network training. The latter performs well at lower sparsity on GNNs but faces performance collapse at higher sparsity levels. With this in mind, we take the first step to propose a new research line and concept termed Graph Sparse Training (GST), which dynamically manipulates sparsity at the data level. Specifically, GST initially constructs a topology & semantic anchor at a low training cost, followed by performing dynamic sparse training to align the sparse graph with the anchor. We introduce the Equilibria Sparsification Principle to guide this process, effectively balancing the preservation of both topological and semantic information. Ultimately, GST produces a sparse graph with maximum topological integrity and no performance degradation. Extensive experiments on 6 datasets and 5 backbones showcase that GST (I) identifies subgraphs at higher graph sparsity levels (1.67%~15.85% $\uparrow$) than state-of-the-art sparsification methods, (II) preserves more key spectral properties, (III) achieves 1.27-3.42$\times$ speedup in GNN inference and (IV) successfully helps graph adversarial defense and graph lottery tickets.

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

TL;DR

Abstract

) than state-of-the-art sparsification methods, (II) preserves more key spectral properties, (III) achieves 1.27-3.42

speedup in GNN inference and (IV) successfully helps graph adversarial defense and graph lottery tickets.

Paper Structure (35 sections, 22 equations, 13 figures, 13 tables, 1 algorithm)

This paper contains 35 sections, 22 equations, 13 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Methodology
Notations and Formulations
Framework Overview
Pursuing Anchor Graph
Dynamical Sparse Graph Training
Topological & Semantical Criterion Design
Experiments
Experiment Setup
GST Excels In Combating Sparsity ($\mathcal{RQ}$1)
Spectral Preservation Helps Sparsification ($\mathcal{RQ}$2)
GST Significantly Accelerates Computations ($\mathcal{RQ}$3)
High Robustness and Versatility of GST ($\mathcal{RQ}$4)
Ablation & Parameter Sensitivity Analysis ($\mathcal{RQ}$5)
...and 20 more sections

Figures (13)

Figure 1: Graph sparsifier (UGS & Spectral Radius) comparison on Ogbn-Proteins using 3-layer GraphSAGE at varying graph sparsity levels $\{10\%,20\%,\cdots,60\%\}$. (Left) ROC-AUC score after different levels of sparsification. (Right) The spectral preservation ratioof the obtained sparse graph.
Figure 2: (Left) The overview of GST; (Right) The detailed pipeline of GST. GST dynamically adjusts and updates the sparse graph, guided by an anchor graph from full-graph training, to optimize topological and semantic preservation, and finally yields a sparse subgraph at the desired sparsity along with admirable accuracy, storage saving, and inference speedup.
Figure 3: Performance comparison of graph sparsification methods on Citeseer (First Row) and PubMed (Second Row) under different sparsity levels. The gray dashed line represents the original baseline performance.
Figure 4: The relative error of the top-200 and bottom-200 eigenvalues on PubMed+GCN, i.e., $\frac{\lambda_i - \lambda'_i}{\lambda_i}$, sparsified by different methods at sparsity level $20\%$ and $50\%$.
Figure 5: The inference latency on Ogbn-Proteins with different sparsifiers when their performance loss is negligible ($\leq 1\%$).
...and 8 more figures

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

TL;DR

Abstract

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

Authors

TL;DR

Abstract

Table of Contents

Figures (13)