SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation
Haoyu Liu, Ningyi Liao, Siqiang Luo
TL;DR
SIGMA tackles heterophily in graph neural networks by introducing SimRank-based global aggregation that can be computed in a one-time precomputation. The method combines topology and attributes through MLPs to form a node representation H, then uses a precomputed SimRank matrix S to perform global aggregation via \widehat{Z}_u = \sum_v S(u,v) H_v, followed by Z_u = (1-\alpha) \widehat{Z}_u + \alpha H_u. The authors prove that this approach captures distant but structurally similar nodes, enabling robust performance under heterophily, and they show strong empirical results across 12 datasets with significant speedups, especially on large graphs like Pokec. A key practical contribution is the top-k pruning of SimRank to achieve \mathcal{O}(kn) aggregation complexity, enabling scalable deployment. Overall, SIGMA delivers state-of-the-art accuracy with superior efficiency and provides a solid theoretical basis for global, similarity-driven aggregation in heterophilous graphs.
Abstract
Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $\mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5\times$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.
