Table of Contents
Fetching ...

Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts

Jinluan Yang, Zhengyu Chen, Teng Xiao, Wenqiao Zhang, Yong Lin, Kun Kuang

TL;DR

This work identifies a novel graph structure distribution shift in heterophilic graphs (HGSS) where train and test nodes exhibit different neighbor patterns, challenging augmentation-based invariant learning. It introduces HEI, a framework that infers latent environments from node heterophily via neighbor-pattern similarity and an environment classifier, then enforces invariance across these environments with a penalty while sharing a common encoder. The approach yields theoretical guarantees and demonstrates consistent improvements over state-of-the-art baselines on multiple heterophilic benchmarks and backbones, under both standard and severe distribution shifts. By avoiding explicit augmentation and leveraging intrinsic heterophily signals, HEI offers a practical and robust solution for invariant learning on heterophilic graphs with distribution shifts.

Abstract

Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better HGNN backbones or architectures for node classification tasks on heterophilic and homophilic graph benchmarks simultaneously, and their analyses of HGNN performance with respect to nodes are only based on the determined data distribution without exploring the effect caused by this structural difference between training and testing nodes. How to learn invariant node representations on heterophilic graphs to handle this structure difference or distribution shifts remains unexplored. In this paper, we first discuss the limitations of previous graph-based invariant learning methods from the perspective of data augmentation. Then, we propose \textbf{HEI}, a framework capable of generating invariant node representations through incorporating heterophily information to infer latent environments without augmentation, which are then used for invariant prediction, under heterophilic graph structure distribution shifts. We theoretically show that our proposed method can achieve guaranteed performance under heterophilic graph structure distribution shifts. Extensive experiments on various benchmarks and backbones can also demonstrate the effectiveness of our method compared with existing state-of-the-art baselines. The code is available at https://github.com/Yangjinluan/HEI

Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts

TL;DR

This work identifies a novel graph structure distribution shift in heterophilic graphs (HGSS) where train and test nodes exhibit different neighbor patterns, challenging augmentation-based invariant learning. It introduces HEI, a framework that infers latent environments from node heterophily via neighbor-pattern similarity and an environment classifier, then enforces invariance across these environments with a penalty while sharing a common encoder. The approach yields theoretical guarantees and demonstrates consistent improvements over state-of-the-art baselines on multiple heterophilic benchmarks and backbones, under both standard and severe distribution shifts. By avoiding explicit augmentation and leveraging intrinsic heterophily signals, HEI offers a practical and robust solution for invariant learning on heterophilic graphs with distribution shifts.

Abstract

Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better HGNN backbones or architectures for node classification tasks on heterophilic and homophilic graph benchmarks simultaneously, and their analyses of HGNN performance with respect to nodes are only based on the determined data distribution without exploring the effect caused by this structural difference between training and testing nodes. How to learn invariant node representations on heterophilic graphs to handle this structure difference or distribution shifts remains unexplored. In this paper, we first discuss the limitations of previous graph-based invariant learning methods from the perspective of data augmentation. Then, we propose \textbf{HEI}, a framework capable of generating invariant node representations through incorporating heterophily information to infer latent environments without augmentation, which are then used for invariant prediction, under heterophilic graph structure distribution shifts. We theoretically show that our proposed method can achieve guaranteed performance under heterophilic graph structure distribution shifts. Extensive experiments on various benchmarks and backbones can also demonstrate the effectiveness of our method compared with existing state-of-the-art baselines. The code is available at https://github.com/Yangjinluan/HEI
Paper Structure (13 sections, 11 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 13 sections, 11 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) illustrates the heterophilic graph structure distribution shifts (HGSS), where the figure and histogram show the HGSS and neighbor pattern (measured by node homophily) difference between train and test nodes on the Squirrel dataset; (b) displays the comparison of different environment construction strategies between previous invariant learning works and ours from augmentation; (c) shows that the environment construction of previous methods may be ineffective in addressing the HGSS due to the unchanged neighbor pattern distribution. The experimental results between traditional and graph-based invariant learning methods can support our analysis and verify the superiority of our proposed HEI.
  • Figure 2: Illustrations of our framework HEI. (a) The neighbor pattern for each train node can be estimated by similarity first and then used for inferring environments without augmentation; (b) Based on the train nodes belonging to different inferred environments, we can train a set of environment-independent GNN classifiers with the shared encoder compared with the base GNN. The shared encoder outputs the representations of nodes in each environment and then forwards them to the base GNN classifier and the environment-independent classifier respectively. By calculating the loss gap between these two different classifiers, an invariance penalty is introduced to improve model generalization.
  • Figure 3: Comparison experiments under Simulation Settings where exists severe distribution shift between train and test, including $Train_{High}$ on $Test_{Low}$ and $Train_{Low}$ on $Test_{High}$. We adopt the LINKX as the backbone there.
  • Figure 4: Parameter Sensitivity of environmental numbers $k$ under Standard Settings.
  • Figure 5: Comparisons between our work and previous graph-based invariant learning works from the causal perspective. Notably, the basic HGNN directly aggregates the selected neighbors' full features without further separating like the above two types of invariant learning methods.
  • ...and 3 more figures