Sparse Decomposition of Graph Neural Networks
Yaochen Hu, Mai Zeng, Ge Zhang, Pavel Rumiantsev, Liheng Ma, Yingxue Zhang, Mark Coates
TL;DR
SDGNN introduces a sparse decomposition to approximate target GNN embeddings with low online inference cost. By learning a feature transform $\phi(\cdot;\mathbf{W})$ and node-specific sparse weights $\boldsymbol{\theta}_z$, it represents each node as $\hat{g}(z,\mathbf{X}|\mathcal{G}) = {\boldsymbol{\theta}}_z^\top \phi(\mathbf{X};\mathbf{W})$, enabling per-node complexity $O(\bar{d}L)$. The optimization alternates between a Lasso-based phase for $\boldsymbol{\theta}_z$ and a gradient-based phase for $\mathbf{W}$, with scalable strategies like mini-batching and candidate-set narrowing to handle large graphs. Empirical results on seven node-classification datasets and two spatio-temporal forecasting tasks show SDGNN closely matches or surpasses target GNN performance while offering substantially reduced inference times, making online prediction with dynamic node features feasible. This work provides a practical framework for deploying GNNs in real-time settings by balancing expressive power and inference efficiency.
Abstract
Graph Neural Networks (GNN) exhibit superior performance in graph representation learning, but their inference cost can be high, due to an aggregation operation that can require a memory fetch for a very large number of nodes. This inference cost is the major obstacle to deploying GNN models with \emph{online prediction} to reflect the potentially dynamic node features. To address this, we propose an approach to reduce the number of nodes that are included during aggregation. We achieve this through a sparse decomposition, learning to approximate node representations using a weighted sum of linearly transformed features of a carefully selected subset of nodes within the extended neighbourhood. The approach achieves linear complexity with respect to the average node degree and the number of layers in the graph neural network. We introduce an algorithm to compute the optimal parameters for the sparse decomposition, ensuring an accurate approximation of the original GNN model, and present effective strategies to reduce the training time and improve the learning process. We demonstrate via extensive experiments that our method outperforms other baselines designed for inference speedup, achieving significant accuracy gains with comparable inference times for both node classification and spatio-temporal forecasting tasks.
