SGS-GNN: A Supervised Graph Sparsification method for Graph Neural Networks
Siddhartha Shankar Das, Naheed Anjum Arafat, Muftiqur Rahman, S M Ferdous, Alex Pothen, Mahantesh M Halappanavar
TL;DR
SGS-GNN introduces a supervised graph sparsifier that learns a global edge-distribution via EdgeMLP to sample sparse subgraphs for downstream GNNs. It couples two regularizers—assortativity for edge homophily and a consistency term aligning edge probabilities with embeddings—and augments the learned distribution with a degree-based prior to accelerate search. The method yields 4–7% average gains in F1 over comparable sparsifiers and up to 30% improvements on heterophilic graphs with only 20% of edges retained, while converging faster than fixed-distribution approaches. Empirically, SGS-GNN demonstrates strong performance across 33 graphs (homophilic and heterophilic) and remains scalable through batch processing and ensemble inference. The approach offers practical impact by reducing inference cost while preserving or enhancing predictive accuracy in challenging graph settings.
Abstract
We propose SGS-GNN, a novel supervised graph sparsifier that learns the sampling probability distribution of edges and samples sparse subgraphs of a user-specified size to reduce the computational costs required by GNNs for inference tasks on large graphs. SGS-GNN employs regularizers in the loss function to enhance homophily in sparse subgraphs, boosting the accuracy of GNNs on heterophilic graphs, where a significant number of the neighbors of a node have dissimilar labels. SGS-GNN also supports conditional updates of the probability distribution learning module based on a prior, which helps narrow the search space for sparse graphs. SGS-GNN requires fewer epochs to obtain high accuracies since it learns the search space of subgraphs more effectively than methods using fixed distributions such as random sampling. Extensive experiments using 33 homophilic and heterophilic graphs demonstrate the following: (i) with only 20% of edges retained in the sparse subgraphs, SGS-GNN improves the F1-scores by a geometric mean of 4% relative to the original graph; on heterophilic graphs, the prediction accuracy is better up to 30%. (ii) SGS-GNN outperforms state-of-the-art methods with improvement in F1-scores of 4-7% in geometric mean with similar sparsities in the sampled subgraphs, and (iii) compared to sparsifiers that employ fixed distributions, SGS-GNN requires about half the number of epochs to converge.
