Table of Contents
Fetching ...

The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, Yuxuan Liang

TL;DR

The paper tackles the limitations of homophily assumptions in GNNs by introducing the Heterophily Snowflake Hypothesis (Hetero-S) and Heterophily-aware Early Stopping (HES), enabling node-specific receptive-field pruning guided by a proxy label predictor and a learned homophily mask. The authors provide theoretical backing via Graph Neural Tangent Kernel and SBM analyses, and demonstrate broad empirical gains across 10 graphs, 10 backbones, and depths up to 32 layers, achieving significant sparsity and faster inference without compromising accuracy. The approach is model-agnostic, integrates with both non-local and refinement-based GNNs, and yields practical benefits in deep heterophilic regimes, making it a versatile tool for robust graph representation learning. The work thus offers a principled framework for per-node adaptive aggregation in heterophilic graphs with strong implications for efficiency and scalability.

Abstract

Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{for the first time}, we transfer the prevailing concept of ``one node one receptive field" to the heterophilic graph. By constructing a proxy label predictor, we enable each node to possess a latent prediction distribution, which assists connected nodes in determining whether they should aggregate their associated neighbors. Ultimately, every node can have its own unique aggregation hop and pattern, much like each snowflake is unique and possesses its own characteristics. Based on observations, we innovatively introduce the Heterophily Snowflake Hypothesis and provide an effective solution to guide and facilitate research on heterophilic graphs and beyond. We conduct comprehensive experiments including (1) main results on 10 graphs with varying heterophily ratios across 10 backbones; (2) scalability on various deep GNN backbones (SGC, JKNet, etc.) across various large number of layers (2,4,6,8,16,32 layers); (3) comparison with conventional snowflake hypothesis; (4) efficiency comparison with existing graph pruning algorithms. Our observations show that our framework acts as a versatile operator for diverse tasks. It can be integrated into various GNN frameworks, boosting performance in-depth and offering an explainable approach to choosing the optimal network depth. The source code is available at \url{https://github.com/bingreeky/HeteroSnoH}.

The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

TL;DR

The paper tackles the limitations of homophily assumptions in GNNs by introducing the Heterophily Snowflake Hypothesis (Hetero-S) and Heterophily-aware Early Stopping (HES), enabling node-specific receptive-field pruning guided by a proxy label predictor and a learned homophily mask. The authors provide theoretical backing via Graph Neural Tangent Kernel and SBM analyses, and demonstrate broad empirical gains across 10 graphs, 10 backbones, and depths up to 32 layers, achieving significant sparsity and faster inference without compromising accuracy. The approach is model-agnostic, integrates with both non-local and refinement-based GNNs, and yields practical benefits in deep heterophilic regimes, making it a versatile tool for robust graph representation learning. The work thus offers a principled framework for per-node adaptive aggregation in heterophilic graphs with strong implications for efficiency and scalability.

Abstract

Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{for the first time}, we transfer the prevailing concept of ``one node one receptive field" to the heterophilic graph. By constructing a proxy label predictor, we enable each node to possess a latent prediction distribution, which assists connected nodes in determining whether they should aggregate their associated neighbors. Ultimately, every node can have its own unique aggregation hop and pattern, much like each snowflake is unique and possesses its own characteristics. Based on observations, we innovatively introduce the Heterophily Snowflake Hypothesis and provide an effective solution to guide and facilitate research on heterophilic graphs and beyond. We conduct comprehensive experiments including (1) main results on 10 graphs with varying heterophily ratios across 10 backbones; (2) scalability on various deep GNN backbones (SGC, JKNet, etc.) across various large number of layers (2,4,6,8,16,32 layers); (3) comparison with conventional snowflake hypothesis; (4) efficiency comparison with existing graph pruning algorithms. Our observations show that our framework acts as a versatile operator for diverse tasks. It can be integrated into various GNN frameworks, boosting performance in-depth and offering an explainable approach to choosing the optimal network depth. The source code is available at \url{https://github.com/bingreeky/HeteroSnoH}.
Paper Structure (29 sections, 2 theorems, 20 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 2 theorems, 20 equations, 13 figures, 7 tables, 1 algorithm.

Key Result

lemma 1

Assuming that $\mathcal{NH}^k_i$ for $v_i$ decreases w.r.t $k$ in proportion to $\zeta$ ($\zeta>1$), meaning that as the receptive field expands, $v_i$ aggregates more heterophilic information. Under such circumstances, when employing receptive field stopping, there exists $k \geq 2$ satisfying the

Figures (13)

  • Figure 1: The algorithm workflow of Heterophilic Snowflake Hypothesis (Hetero-S) and Heterophily-aware Early Stopping (HES).
  • Figure 2: The algorithm workflow of Heterophilic Snowflake Hypothesis (Hetero-S) and Heterophily-aware Early Stopping (HES).
  • Figure 3: The pipeline of our HES framework. For each node, we utilize a proxy model to evaluate the homophily strength of its edges, which is further used to estimate its multi-hop homophily ratio. Based on the homophily strength at each hop, we perform receptive field-level early stopping to determine a unique receptive field for each node.
  • Figure 4: The original baselines and +❄ results across Cornell, Squirrel and Wisconsin three benchmarks on 8-layer settings.
  • Figure 5: The JKNet and +❄ results across CS, DBLP, Actor and Chameleon four benchmarks on 2, 4, 8, 16, 32-layer settings.
  • ...and 8 more figures

Theorems & Definitions (2)

  • lemma 1
  • lemma 2