Table of Contents
Fetching ...

Are Heterogeneous Graph Neural Networks Truly Effective? A Causal Perspective

Xiao Yang, Xuejiao Zhao, Zhiqi Shen

TL;DR

This work asks whether heterogeneous graph neural networks are inherently effective for node classification. It conducts a large-scale replication of 20 HGNN baselines on 21 heterogeneous datasets and tunes a simple architecture (RGCN) to isolate architecture effects from heterogeneous information. A causal-effect framework is developed to evaluate factors under factual and counterfactual analyses, using minimal sufficient adjustment sets and cross-method robustness checks. The results show no causal impact from model architecture or complexity, while heterogeneous information exerts a positive causal effect by increasing homophily and local–global distribution discrepancy, thereby enhancing class separability. The study provides rigorous evidence that HGNN benefits stem from heterogeneous structural signals rather than architectural sophistication, guiding future evaluation and design of HGNNs.

Abstract

Graph neural networks (GNNs) have achieved remarkable success in node classification. Building on this progress, heterogeneous graph neural networks (HGNNs) integrate relation types and node and edge semantics to leverage heterogeneous information. Causal analysis for HGNNs is advancing rapidly, aiming to separate genuine causal effects from spurious correlations. However, whether HGNNs are intrinsically effective remains underexamined, and most studies implicitly assume rather than establish this effectiveness. In this work, we examine HGNNs from two perspectives: model architecture and heterogeneous information. We conduct a systematic reproduction across 21 datasets and 20 baselines, complemented by comprehensive hyperparameter retuning. To further disentangle the source of performance gains, we develop a causal effect estimation framework that constructs and evaluates candidate factors under standard assumptions through factual and counterfactual analyses, with robustness validated via minimal sufficient adjustment sets, cross-method consistency checks, and sensitivity analyses. Our results lead to two conclusions. First, model architecture and complexity have no causal effect on performance. Second, heterogeneous information exerts a positive causal effect by increasing homophily and local-global distribution discrepancy, which makes node classes more distinguishable. The implementation is publicly available at https://github.com/YXNTU/CausalHGNN.

Are Heterogeneous Graph Neural Networks Truly Effective? A Causal Perspective

TL;DR

This work asks whether heterogeneous graph neural networks are inherently effective for node classification. It conducts a large-scale replication of 20 HGNN baselines on 21 heterogeneous datasets and tunes a simple architecture (RGCN) to isolate architecture effects from heterogeneous information. A causal-effect framework is developed to evaluate factors under factual and counterfactual analyses, using minimal sufficient adjustment sets and cross-method robustness checks. The results show no causal impact from model architecture or complexity, while heterogeneous information exerts a positive causal effect by increasing homophily and local–global distribution discrepancy, thereby enhancing class separability. The study provides rigorous evidence that HGNN benefits stem from heterogeneous structural signals rather than architectural sophistication, guiding future evaluation and design of HGNNs.

Abstract

Graph neural networks (GNNs) have achieved remarkable success in node classification. Building on this progress, heterogeneous graph neural networks (HGNNs) integrate relation types and node and edge semantics to leverage heterogeneous information. Causal analysis for HGNNs is advancing rapidly, aiming to separate genuine causal effects from spurious correlations. However, whether HGNNs are intrinsically effective remains underexamined, and most studies implicitly assume rather than establish this effectiveness. In this work, we examine HGNNs from two perspectives: model architecture and heterogeneous information. We conduct a systematic reproduction across 21 datasets and 20 baselines, complemented by comprehensive hyperparameter retuning. To further disentangle the source of performance gains, we develop a causal effect estimation framework that constructs and evaluates candidate factors under standard assumptions through factual and counterfactual analyses, with robustness validated via minimal sufficient adjustment sets, cross-method consistency checks, and sensitivity analyses. Our results lead to two conclusions. First, model architecture and complexity have no causal effect on performance. Second, heterogeneous information exerts a positive causal effect by increasing homophily and local-global distribution discrepancy, which makes node classes more distinguishable. The implementation is publicly available at https://github.com/YXNTU/CausalHGNN.

Paper Structure

This paper contains 27 sections, 29 equations, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Overview of our research roadmap for disentangling the causal effects of heterogeneous information.