Table of Contents
Fetching ...

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Haoyu Liu, Ningyi Liao, Siqiang Luo

TL;DR

SIGMA tackles heterophily in graph neural networks by introducing SimRank-based global aggregation that can be computed in a one-time precomputation. The method combines topology and attributes through MLPs to form a node representation H, then uses a precomputed SimRank matrix S to perform global aggregation via \widehat{Z}_u = \sum_v S(u,v) H_v, followed by Z_u = (1-\alpha) \widehat{Z}_u + \alpha H_u. The authors prove that this approach captures distant but structurally similar nodes, enabling robust performance under heterophily, and they show strong empirical results across 12 datasets with significant speedups, especially on large graphs like Pokec. A key practical contribution is the top-k pruning of SimRank to achieve \mathcal{O}(kn) aggregation complexity, enabling scalable deployment. Overall, SIGMA delivers state-of-the-art accuracy with superior efficiency and provides a solid theoretical basis for global, similarity-driven aggregation in heterophilous graphs.

Abstract

Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $\mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5\times$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

TL;DR

SIGMA tackles heterophily in graph neural networks by introducing SimRank-based global aggregation that can be computed in a one-time precomputation. The method combines topology and attributes through MLPs to form a node representation H, then uses a precomputed SimRank matrix S to perform global aggregation via \widehat{Z}_u = \sum_v S(u,v) H_v, followed by Z_u = (1-\alpha) \widehat{Z}_u + \alpha H_u. The authors prove that this approach captures distant but structurally similar nodes, enabling robust performance under heterophily, and they show strong empirical results across 12 datasets with significant speedups, especially on large graphs like Pokec. A key practical contribution is the top-k pruning of SimRank to achieve \mathcal{O}(kn) aggregation complexity, enabling scalable deployment. Overall, SIGMA delivers state-of-the-art accuracy with superior efficiency and provides a solid theoretical basis for global, similarity-driven aggregation in heterophilous graphs.

Abstract

Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size . Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.
Paper Structure (16 sections, 4 theorems, 22 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 16 sections, 4 theorems, 22 equations, 8 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.2

On graph $G$ with SimRank matrix $\textbf{S}$ and arbitrary initialized node embedding matrix $\textbf{H}$, denoted the SimRank aggregated feature matrix as $\widehat{\textbf{Z}}=\textbf{SH}$, for each node $u \in V$, we have:

Figures (8)

  • Figure 1: All sub-figures are from Texas heterophily graph. (a) A toy example of global structural similarity. Two staffs inherit high similarity because they share similar neighbors intuitively. (b) Neighborhood-based local aggregation and (c) SIGMA aggregation. Node color represents aggregation score with respect to the center node ($\blacktriangle$). Conventional aggregation focuses on neighboring nodes regardless of node label, while SIGMA succeeds in assigning high values for nodes with same label ($\blacktriangle$).
  • Figure 2: Patternes of SimRank scores over intra-class and inter-class node pairs. X-axis denotes the similarity score corresponding to one node pair and Y-axis the density.
  • Figure 3: Architecture of SIGMA.
  • Figure 4: Convergence efficiency of SIGMA and leading baselines. X-axis denotes the training time (s) and Y is the accuracy (%).
  • Figure 5: Scalability Evaluation of SIGMA and GloGNN. X-axis denotes the graph edge scale at $\{\frac{3\times 10^8}{2.5^i}\}_{i=0}^8$ and Y-axis the wall clock time (s). Note that X-axis is in log-scale.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 3.1: Pairwise Random Walk simrank
  • Theorem 3.2
  • proof
  • Corollary 3.3
  • proof
  • Theorem 3.4
  • proof
  • Lemma 3.5: simpush