Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning
Yuhao Wu, Jiangchao Yao, Bo Han, Lina Yao, Tongliang Liu
TL;DR
This work tackles graph-based Positive-Unlabeled learning under edge heterophily, showing that heterophilic structure breaks class-prior estimation and latent-label inference. It introduces GPL, a bilevel framework that jointly reduces heterophily via a Label Propagation Loss (LPL) and learns a PU classifier, with the inner loop refining the graph and the outer loop training the classifier using an updated class prior. Theoretical analysis connects LPL to reduced heterophily influence and tighter class-prior estimation bounds, while experiments across diverse datasets demonstrate consistent improvements over strong baselines, particularly in heterogeneous graphs. The approach offers a principled, end-to-end method for robust graph PU learning without assuming homophily or a known class prior, with practical implications for fraud detection, epidemiology, and other graph-centric domains.
Abstract
While Positive-Unlabeled (PU) learning is vital in many real-world scenarios, its application to graph data still remains under-explored. We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation (class prior is essential for building PU learning algorithms) and degenerates the latent label inference on unlabeled nodes during classifier training. In response to this challenge, we introduce a new method, named Graph PU Learning with Label Propagation Loss (GPL). Specifically, GPL considers learning from PU nodes along with an intermediate heterophily reduction, which helps mitigate the negative impact of the heterophilic structure. We formulate this procedure as a bilevel optimization that reduces heterophily in the inner loop and efficiently learns a classifier in the outer loop. Extensive experiments across a variety of datasets have shown that GPL significantly outperforms baseline methods, confirming its effectiveness and superiority.
