Table of Contents
Fetching ...

Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification

Henan Sun, Xunkai Li, Zhengyu Wu, Daohan Su, Rong-Hua Li, Guoren Wang

TL;DR

AMUD is introduced, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning.

Abstract

Recently, graph neural networks (GNNs) have shown prominent performance in semi-supervised node classification by leveraging knowledge from the graph database. However, most existing GNNs follow the homophily assumption, where connected nodes are more likely to exhibit similar feature distributions and the same labels, and such an assumption has proven to be vulnerable in a growing number of practical applications. As a supplement, heterophily reflects dissimilarity in connected nodes, which has gained significant attention in graph learning. To this end, data engineers aim to develop a powerful GNN model that can ensure performance under both homophily and heterophily. Despite numerous attempts, most existing GNNs struggle to achieve optimal node representations due to the constraints of undirected graphs. The neglect of directed edges results in sub-optimal graph representations, thereby hindering the capacity of GNNs. To address this issue, we introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning. Furthermore, we propose Adaptive Directed Pattern Aggregation (ADPA) as a new directed graph learning paradigm for AMUD. Empirical studies have demonstrated that AMUD guides efficient graph learning. Meanwhile, extensive experiments on 16 benchmark datasets substantiate the impressive performance of ADPA, outperforming baselines by significant margins of 3.96.

Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification

TL;DR

AMUD is introduced, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning.

Abstract

Recently, graph neural networks (GNNs) have shown prominent performance in semi-supervised node classification by leveraging knowledge from the graph database. However, most existing GNNs follow the homophily assumption, where connected nodes are more likely to exhibit similar feature distributions and the same labels, and such an assumption has proven to be vulnerable in a growing number of practical applications. As a supplement, heterophily reflects dissimilarity in connected nodes, which has gained significant attention in graph learning. To this end, data engineers aim to develop a powerful GNN model that can ensure performance under both homophily and heterophily. Despite numerous attempts, most existing GNNs struggle to achieve optimal node representations due to the constraints of undirected graphs. The neglect of directed edges results in sub-optimal graph representations, thereby hindering the capacity of GNNs. To address this issue, we introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning. Furthermore, we propose Adaptive Directed Pattern Aggregation (ADPA) as a new directed graph learning paradigm for AMUD. Empirical studies have demonstrated that AMUD guides efficient graph learning. Meanwhile, extensive experiments on 16 benchmark datasets substantiate the impressive performance of ADPA, outperforming baselines by significant margins of 3.96.
Paper Structure (22 sections, 2 theorems, 11 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 11 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Undirected GNNs are more suitable for handling homophilous undirected graphs, while directed GNNs exhibit a significant advantage in dealing with heterophilous digraphs.

Figures (7)

  • Figure 1: Workflow with our proposal. Paradigm I/II represents the dichotomy of the learning process determined by the output of AMUD.
  • Figure 2: The two observations mentioned in Sec. \ref{['sec: introduction']}L2. AMD is the directed output of AMUD. CoraML, Chameleon, CiteSeer, and Squirrel are four natural digraphs, where Chameleon and Squirrel are filter versions in platonov2023hete_gnn_survey4. GCN, GRP-GNN, and AEROGNN are three undirected GNNs. DiGCN, NSTE, and DirGNN are three directed GNNs. U- and D- represent the input of undirected and directed graphs.
  • Figure 3: A toy example for our proposed AMUD.
  • Figure 4: Overview of our proposed ADPA, including (a) discover DPs and achieve multi-scale directed feature propagation; (b) combine the $K$-step propagated features within $k$-order DPs to obtain multi-granularity node representations by node-adaptive attention.
  • Figure 5: Convergence curves on the AMUndirected (upper) and AMDirected (lower).
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2