Table of Contents
Fetching ...

Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning

Yuhao Wu, Jiangchao Yao, Bo Han, Lina Yao, Tongliang Liu

TL;DR

This work tackles graph-based Positive-Unlabeled learning under edge heterophily, showing that heterophilic structure breaks class-prior estimation and latent-label inference. It introduces GPL, a bilevel framework that jointly reduces heterophily via a Label Propagation Loss (LPL) and learns a PU classifier, with the inner loop refining the graph and the outer loop training the classifier using an updated class prior. Theoretical analysis connects LPL to reduced heterophily influence and tighter class-prior estimation bounds, while experiments across diverse datasets demonstrate consistent improvements over strong baselines, particularly in heterogeneous graphs. The approach offers a principled, end-to-end method for robust graph PU learning without assuming homophily or a known class prior, with practical implications for fraud detection, epidemiology, and other graph-centric domains.

Abstract

While Positive-Unlabeled (PU) learning is vital in many real-world scenarios, its application to graph data still remains under-explored. We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation (class prior is essential for building PU learning algorithms) and degenerates the latent label inference on unlabeled nodes during classifier training. In response to this challenge, we introduce a new method, named Graph PU Learning with Label Propagation Loss (GPL). Specifically, GPL considers learning from PU nodes along with an intermediate heterophily reduction, which helps mitigate the negative impact of the heterophilic structure. We formulate this procedure as a bilevel optimization that reduces heterophily in the inner loop and efficiently learns a classifier in the outer loop. Extensive experiments across a variety of datasets have shown that GPL significantly outperforms baseline methods, confirming its effectiveness and superiority.

Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning

TL;DR

This work tackles graph-based Positive-Unlabeled learning under edge heterophily, showing that heterophilic structure breaks class-prior estimation and latent-label inference. It introduces GPL, a bilevel framework that jointly reduces heterophily via a Label Propagation Loss (LPL) and learns a PU classifier, with the inner loop refining the graph and the outer loop training the classifier using an updated class prior. Theoretical analysis connects LPL to reduced heterophily influence and tighter class-prior estimation bounds, while experiments across diverse datasets demonstrate consistent improvements over strong baselines, particularly in heterogeneous graphs. The approach offers a principled, end-to-end method for robust graph PU learning without assuming homophily or a known class prior, with practical implications for fraud detection, epidemiology, and other graph-centric domains.

Abstract

While Positive-Unlabeled (PU) learning is vital in many real-world scenarios, its application to graph data still remains under-explored. We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation (class prior is essential for building PU learning algorithms) and degenerates the latent label inference on unlabeled nodes during classifier training. In response to this challenge, we introduce a new method, named Graph PU Learning with Label Propagation Loss (GPL). Specifically, GPL considers learning from PU nodes along with an intermediate heterophily reduction, which helps mitigate the negative impact of the heterophilic structure. We formulate this procedure as a bilevel optimization that reduces heterophily in the inner loop and efficiently learns a classifier in the outer loop. Extensive experiments across a variety of datasets have shown that GPL significantly outperforms baseline methods, confirming its effectiveness and superiority.
Paper Structure (41 sections, 11 theorems, 33 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 41 sections, 11 theorems, 33 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Theorem 2.1

In a graph, $\mathbbm{P}_\mathrm{n}$ is irreducible with respect to $\mathbbm{P}_\mathrm{p}$ iff it satisfies the following condition: where $\mathop{\mathrm{ess\,sup}}\limits$ is the essential supremum.

Figures (5)

  • Figure 1: Estimation error of our method and the baseline estimator for Class-Prior Estimation, under varying heterophily ratio. The way to modify the graph structure with different heterophily ratios is referred to ma2021homophily.
  • Figure 2: The distributions of the predicted probabilities (of being positive) for unlabeled nodes when training a classifier on positive and unlabeled CORA dataset, under varying heterophily ratios. The methodology employed to modify the graph structure with different heterophily ratios can be referred to ma2021homophily. We show that as the heterophily ratio increases, the distribution of predicted probabilities of positives and negatives become less and less separable.
  • Figure 3: Distribution of edge weights for heterophilic and homophilic edges on Cornell dataset as training process in GPL.
  • Figure 4: The F1 scores of GPL and the baseline approaches for PU learning. We change the ratio $r_{\mathrm{p}}$ of observed positive nodes among all positive ones.
  • Figure 5: The F1 score of GPL with increasing $K$ on Cora, WIKI-CS, Texas and Wisconsin.

Theorems & Definitions (17)

  • Theorem 2.1: Graph Irreducible Condition
  • Definition 4.1
  • Theorem 4.2
  • Lemma 4.3: Theorem 1. of garg2021mixture
  • Theorem 4.4: Estimation Error
  • Theorem 4.5
  • Definition 1.1: Irreducibility
  • Lemma 1.2
  • Proposition 1.3
  • Lemma 2.1
  • ...and 7 more