Table of Contents
Fetching ...

What Is Missing In Homophily? Disentangling Graph Homophily For Graph Neural Networks

Yilun Zheng, Sitao Luan, Lihui Chen

TL;DR

A new composite metric is derived, named Tri-Hom, that considers all $3 aspects and overcomes the limitations of conventional homophily metrics, and has significantly higher correlation values than existing metrics that only focus on a single homophily aspect, demonstrating its superiority and the importance of homophily synergy.

Abstract

Graph homophily refers to the phenomenon that connected nodes tend to share similar characteristics. Understanding this concept and its related metrics is crucial for designing effective Graph Neural Networks (GNNs). The most widely used homophily metrics, such as edge or node homophily, quantify such "similarity" as label consistency across the graph topology. These metrics are believed to be able to reflect the performance of GNNs, especially on node-level tasks. However, many recent studies have empirically demonstrated that the performance of GNNs does not always align with homophily metrics, and how homophily influences GNNs still remains unclear and controversial. Then, a crucial question arises: What is missing in our current understanding of homophily? To figure out the missing part, in this paper, we disentangle the graph homophily into $3$ aspects: label, structural, and feature homophily, providing a more comprehensive understanding of GNN performance. To investigate their synergy, we propose a Contextual Stochastic Block Model with $3$ types of Homophily (CSBM-3H), where the topology and feature generation are controlled by the $3$ metrics. Based on the theoretical analysis of CSBM-3H, we derive a new composite metric, named Tri-Hom, that considers all $3$ aspects and overcomes the limitations of conventional homophily metrics. The theoretical conclusions and the effectiveness of Tri-Hom have been verified through synthetic experiments on CSBM-3H. In addition, we conduct experiments on $31$ real-world benchmark datasets and calculate the correlations between homophily metrics and model performance. Tri-Hom has significantly higher correlation values than $17$ existing metrics that only focus on a single homophily aspect, demonstrating its superiority and the importance of homophily synergy. Our code is available at \url{https://github.com/zylMozart/Disentangle_GraphHom}.

What Is Missing In Homophily? Disentangling Graph Homophily For Graph Neural Networks

TL;DR

A new composite metric is derived, named Tri-Hom, that considers all $3 aspects and overcomes the limitations of conventional homophily metrics, and has significantly higher correlation values than existing metrics that only focus on a single homophily aspect, demonstrating its superiority and the importance of homophily synergy.

Abstract

Graph homophily refers to the phenomenon that connected nodes tend to share similar characteristics. Understanding this concept and its related metrics is crucial for designing effective Graph Neural Networks (GNNs). The most widely used homophily metrics, such as edge or node homophily, quantify such "similarity" as label consistency across the graph topology. These metrics are believed to be able to reflect the performance of GNNs, especially on node-level tasks. However, many recent studies have empirically demonstrated that the performance of GNNs does not always align with homophily metrics, and how homophily influences GNNs still remains unclear and controversial. Then, a crucial question arises: What is missing in our current understanding of homophily? To figure out the missing part, in this paper, we disentangle the graph homophily into aspects: label, structural, and feature homophily, providing a more comprehensive understanding of GNN performance. To investigate their synergy, we propose a Contextual Stochastic Block Model with types of Homophily (CSBM-3H), where the topology and feature generation are controlled by the metrics. Based on the theoretical analysis of CSBM-3H, we derive a new composite metric, named Tri-Hom, that considers all aspects and overcomes the limitations of conventional homophily metrics. The theoretical conclusions and the effectiveness of Tri-Hom have been verified through synthetic experiments on CSBM-3H. In addition, we conduct experiments on real-world benchmark datasets and calculate the correlations between homophily metrics and model performance. Tri-Hom has significantly higher correlation values than existing metrics that only focus on a single homophily aspect, demonstrating its superiority and the importance of homophily synergy. Our code is available at \url{https://github.com/zylMozart/Disentangle_GraphHom}.

Paper Structure

This paper contains 53 sections, 83 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: We measure the impact of label homophily $h_L$, feature homophily $h_F$, and structural homophily $h_S$ through numerical results of Tri-Hom $\mathcal{J}_h^{\mathcal{G}}$ and simulation results of the node classification accuracy with GCN on synthetic datasets.
  • Figure 2: We measure the impact of label homophily $h_L$, feature homophily $h_F$, and structural homophily $h_S$ through numerical results of Tri-Hom $\mathcal{J}_h^{\neg\mathcal{G}}$ and simulation results of the node classification accuracy with MLP on synthetic datasets.
  • Figure 3: The Influences of label homophily $h_L$, structural homophily $h_S$, and feature homophily $h_F$ to graph-agnostic models and graph-aware models
  • Figure 4: The impact of label homophily $h_L$, feature homophily $h_F$, and structural homophily $h_S$ on the accuracy of node classification using MLP and GCN.
  • Figure 5: Label, feature, and structural homophily metrics on real-world datasets are shown as the x-axis, y-axis, and the size of the scatter respectively. The classification performance of GCN is denoted by the color of the scatters.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3