SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Haoyu Liu; Ningyi Liao; Siqiang Luo

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Haoyu Liu, Ningyi Liao, Siqiang Luo

TL;DR

SIGMA tackles heterophily in graph neural networks by introducing SimRank-based global aggregation that can be computed in a one-time precomputation. The method combines topology and attributes through MLPs to form a node representation H, then uses a precomputed SimRank matrix S to perform global aggregation via \widehat{Z}_u = \sum_v S(u,v) H_v, followed by Z_u = (1-\alpha) \widehat{Z}_u + \alpha H_u. The authors prove that this approach captures distant but structurally similar nodes, enabling robust performance under heterophily, and they show strong empirical results across 12 datasets with significant speedups, especially on large graphs like Pokec. A key practical contribution is the top-k pruning of SimRank to achieve \mathcal{O}(kn) aggregation complexity, enabling scalable deployment. Overall, SIGMA delivers state-of-the-art accuracy with superior efficiency and provides a solid theoretical basis for global, similarity-driven aggregation in heterophilous graphs.

Abstract

Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $\mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5\times$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

TL;DR

Abstract

. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains

acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.

Paper Structure (16 sections, 4 theorems, 22 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 16 sections, 4 theorems, 22 equations, 8 figures, 11 tables, 1 algorithm.

Introduction
Preliminaries
SIGMA for Heterophily GNN Aggregation
Interpreting SimRank for heterophily
SIGMA Aggregation Workflow
Complexity Optimization and Analysis
Discussions of SIGMA with Related Works
Other Related Work
Experiments
Experiment Setup
Performance Comparison
Scalability and Efficiency Study
Components Evaluation
Grouping Effect Visualization
Iterative Aggregation Mechanism Exploration
...and 1 more sections

Key Result

Theorem 3.2

On graph $G$ with SimRank matrix $\textbf{S}$ and arbitrary initialized node embedding matrix $\textbf{H}$, denoted the SimRank aggregated feature matrix as $\widehat{\textbf{Z}}=\textbf{SH}$, for each node $u \in V$, we have:

Figures (8)

Figure 1: All sub-figures are from Texas heterophily graph. (a) A toy example of global structural similarity. Two staffs inherit high similarity because they share similar neighbors intuitively. (b) Neighborhood-based local aggregation and (c) SIGMA aggregation. Node color represents aggregation score with respect to the center node ($\blacktriangle$). Conventional aggregation focuses on neighboring nodes regardless of node label, while SIGMA succeeds in assigning high values for nodes with same label ($\blacktriangle$).
Figure 2: Patternes of SimRank scores over intra-class and inter-class node pairs. X-axis denotes the similarity score corresponding to one node pair and Y-axis the density.
Figure 3: Architecture of SIGMA.
Figure 4: Convergence efficiency of SIGMA and leading baselines. X-axis denotes the training time (s) and Y is the accuracy (%).
Figure 5: Scalability Evaluation of SIGMA and GloGNN. X-axis denotes the graph edge scale at $\{\frac{3\times 10^8}{2.5^i}\}_{i=0}^8$ and Y-axis the wall clock time (s). Note that X-axis is in log-scale.
...and 3 more figures

Theorems & Definitions (8)

Definition 3.1: Pairwise Random Walk simrank
Theorem 3.2
proof
Corollary 3.3
proof
Theorem 3.4
proof
Lemma 3.5: simpush

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

TL;DR

Abstract

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)