Table of Contents
Fetching ...

Is Homophily a Necessity for Graph Neural Networks?

Yao Ma, Xiaorui Liu, Neil Shah, Jiliang Tang

TL;DR

The paper questions the necessity of homophily for graph neural networks by examining the standard GCN under heterophily through a CSBM-based theoretical lens and controlled experiments. It shows that when same-label nodes share similar neighborhood patterns and different classes are distinguishable by neighbor distributions, GCNs can achieve strong semi-supervised node classification even on heterophilous graphs. The study couples formal results with extensive empirical analyses on synthetic graphs and real benchmarks, uncovering a nuanced, degree- and distribution-dependent picture (including a V-shaped performance trend as heterophily is varied). Overall, it reframes the narrative around homophily, illustrating that it is not universally necessary, but certain structural conditions must hold for GCNs to excel.

Abstract

Graph neural networks (GNNs) have shown great prowess in learning representations suitable for numerous graph-based machine learning tasks. When applied to semi-supervised node classification, GNNs are widely believed to work well due to the homophily assumption ("like attracts like"), and fail to generalize to heterophilous graphs where dissimilar nodes connect. Recent works design new architectures to overcome such heterophily-related limitations, citing poor baseline performance and new architecture improvements on a few heterophilous graph benchmark datasets as evidence for this notion. In our experiments, we empirically find that standard graph convolutional networks (GCNs) can actually achieve better performance than such carefully designed methods on some commonly used heterophilous graphs. This motivates us to reconsider whether homophily is truly necessary for good GNN performance. We find that this claim is not quite true, and in fact, GCNs can achieve strong performance on heterophilous graphs under certain conditions. Our work carefully characterizes these conditions, and provides supporting theoretical understanding and empirical observations. Finally, we examine existing heterophilous graphs benchmarks and reconcile how the GCN (under)performs on them based on this understanding.

Is Homophily a Necessity for Graph Neural Networks?

TL;DR

The paper questions the necessity of homophily for graph neural networks by examining the standard GCN under heterophily through a CSBM-based theoretical lens and controlled experiments. It shows that when same-label nodes share similar neighborhood patterns and different classes are distinguishable by neighbor distributions, GCNs can achieve strong semi-supervised node classification even on heterophilous graphs. The study couples formal results with extensive empirical analyses on synthetic graphs and real benchmarks, uncovering a nuanced, degree- and distribution-dependent picture (including a V-shaped performance trend as heterophily is varied). Overall, it reframes the narrative around homophily, illustrating that it is not universally necessary, but certain structural conditions must hold for GCNs to excel.

Abstract

Graph neural networks (GNNs) have shown great prowess in learning representations suitable for numerous graph-based machine learning tasks. When applied to semi-supervised node classification, GNNs are widely believed to work well due to the homophily assumption ("like attracts like"), and fail to generalize to heterophilous graphs where dissimilar nodes connect. Recent works design new architectures to overcome such heterophily-related limitations, citing poor baseline performance and new architecture improvements on a few heterophilous graph benchmark datasets as evidence for this notion. In our experiments, we empirically find that standard graph convolutional networks (GCNs) can actually achieve better performance than such carefully designed methods on some commonly used heterophilous graphs. This motivates us to reconsider whether homophily is truly necessary for good GNN performance. We find that this claim is not quite true, and in fact, GCNs can achieve strong performance on heterophilous graphs under certain conditions. Our work carefully characterizes these conditions, and provides supporting theoretical understanding and empirical observations. Finally, we examine existing heterophilous graphs benchmarks and reconcile how the GCN (under)performs on them based on this understanding.

Paper Structure

This paper contains 39 sections, 6 theorems, 33 equations, 21 figures, 8 tables.

Key Result

Theorem 1

Consider a graph $\mathcal{G} = \{\mathcal{V}, \mathcal{E}, \{\mathcal{F}_{c}, c\in \mathcal{C}\}, \{\mathcal{D}_{c}, c\in \mathcal{C}\}\}$, which follows Assumptions (1)-(4). For any node $i\in \mathcal{V}$, the expectation of the pre-activation output of a single GCN operation is given by and for any $t>0$, the probability that the distance between the observation ${\bf h}_i$ and its expectatio

Figures (21)

  • Figure 1: A heterophilous graph on which GCN achieves perfect performance.
  • Figure 2: Two nodes share the same neighborhood distribution; GCN learns equivalent embeddings for $a$ and $b$.
  • Figure 3: Hetero. Edge Addition
  • Figure 4: SSNC accuracy of GCN on synthetic graphs with various homophily ratios.
  • Figure 5: Cross-class neighborhood similarity on synthetic graphs generated from Cora; all graphs have $h=0.25$, but with varying neighborhood distributions as per the noise parameter $\gamma$.
  • ...and 16 more figures

Theorems & Definitions (10)

  • Definition 1: Homophily
  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Definition 2: Cross-Class Neighborhood Similarity (CCNS)
  • Lemma 1: Hoeffding's Inequality
  • Theorem 1
  • proof
  • Theorem 2
  • proof