Table of Contents
Fetching ...

HC-GST: Heterophily-aware Distribution Consistency based Graph Self-training

Fali Wang, Tianxiang Zhao, Junjie Xu, Suhang Wang

TL;DR

The paper tackles training bias in graph self-training (GST) caused by shifts in homophily ratios on heterophilic graphs. It introduces HC-GST, a framework that estimates homophily via soft labels, selects pseudo-nodes to match the global homophily distribution, assigns labels with multi-hop neighbors for heterophilic nodes, and uses a dual-head GNN to leverage all high-quality pseudo-nodes without harming the main classifier. Empirical results across homophilic and heterophilic graphs show HC-GST reduces distribution shifts, lowers bias across homophily bins, and improves self-training performance, especially at low label rates. The approach demonstrates robust gains and provides a principled way to adapt GST to heterophily, with implications for more reliable semi-supervised learning on diverse graphs.

Abstract

Graph self-training (GST), which selects and assigns pseudo-labels to unlabeled nodes, is popular for tackling label sparsity in graphs. However, recent study on homophily graphs show that GST methods could introduce and amplify distribution shift between training and test nodes as they tend to assign pseudo-labels to nodes they are good at. As GNNs typically perform better on homophilic nodes, there could be potential shifts towards homophilic pseudo-nodes, which is underexplored. Our preliminary experiments on heterophilic graphs verify that these methods can cause shifts in homophily ratio distributions, leading to \textit{training bias} that improves performance on homophilic nodes while degrading it on heterophilic ones. Therefore, we study a novel problem of reducing homophily ratio distribution shifts during self-training on heterophilic graphs. A key challenge is the accurate calculation of homophily ratios and their distributions without extensive labeled data. To tackle them, we propose a novel Heterophily-aware Distribution Consistency-based Graph Self-Training (HC-GST) framework, which estimates homophily ratios using soft labels and optimizes a selection vector to align pseudo-nodes with the global homophily ratio distribution. Extensive experiments on both homophilic and heterophilic graphs show that HC-GST effectively reduces training bias and enhances self-training performance.

HC-GST: Heterophily-aware Distribution Consistency based Graph Self-training

TL;DR

The paper tackles training bias in graph self-training (GST) caused by shifts in homophily ratios on heterophilic graphs. It introduces HC-GST, a framework that estimates homophily via soft labels, selects pseudo-nodes to match the global homophily distribution, assigns labels with multi-hop neighbors for heterophilic nodes, and uses a dual-head GNN to leverage all high-quality pseudo-nodes without harming the main classifier. Empirical results across homophilic and heterophilic graphs show HC-GST reduces distribution shifts, lowers bias across homophily bins, and improves self-training performance, especially at low label rates. The approach demonstrates robust gains and provides a principled way to adapt GST to heterophily, with implications for more reliable semi-supervised learning on diverse graphs.

Abstract

Graph self-training (GST), which selects and assigns pseudo-labels to unlabeled nodes, is popular for tackling label sparsity in graphs. However, recent study on homophily graphs show that GST methods could introduce and amplify distribution shift between training and test nodes as they tend to assign pseudo-labels to nodes they are good at. As GNNs typically perform better on homophilic nodes, there could be potential shifts towards homophilic pseudo-nodes, which is underexplored. Our preliminary experiments on heterophilic graphs verify that these methods can cause shifts in homophily ratio distributions, leading to \textit{training bias} that improves performance on homophilic nodes while degrading it on heterophilic ones. Therefore, we study a novel problem of reducing homophily ratio distribution shifts during self-training on heterophilic graphs. A key challenge is the accurate calculation of homophily ratios and their distributions without extensive labeled data. To tackle them, we propose a novel Heterophily-aware Distribution Consistency-based Graph Self-Training (HC-GST) framework, which estimates homophily ratios using soft labels and optimizes a selection vector to align pseudo-nodes with the global homophily ratio distribution. Extensive experiments on both homophilic and heterophilic graphs show that HC-GST effectively reduces training bias and enhances self-training performance.
Paper Structure (26 sections, 15 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Preliminary experiments of BMGCN he2022block on the Chameleon Graph Rozemberczki2021. (a) Performance across crafted training sets biased towards homophily, aligned with the true graph, and biased towards heterophily. (b) Variation in mean homophily ratio across self-training stages for the BMGCN backbone, ST, and DCGST. (c) Training biases in traditional graph self-training (ST) shown through accuracies across different homophily bins. (d) Training biases in DCGST wang2024DCGST.
  • Figure 2: The heterophily-aware distribution consistency-based graph self-training framework. Red arrows indicate the loop.
  • Figure 3: Sensitivity analysis of hyper-parameters w.r.t. $\lambda_{\text{S}}$, $\lambda_{\text{D}}$, and $\delta_{h}$ on Chameleon with $1\%$ label rate.
  • Figure 4: Left: mean homophily ratio of pseudo-nodes during self-training stages on Squirrel. Right: KL divergence between local and global homophily ratio distributions.
  • Figure 5: Training bias across various homophily bins on Chameleon graph with $1\%$ label rate.

Theorems & Definitions (3)

  • definition 1: Node Homophily Ratio $h(v_i)$ pei2020geom
  • definition 2: Graph Homophily Ratio $h(\mathcal{G})$ pei2020geom
  • definition 3: Distribution Shift on Heterophily