UPL: Uncertainty-aware Pseudo-labeling for Imbalance Transductive Node Classification
Mohammad T. Teimuri, Zahra Dehghanian, Gholamali Aminian, Hamid R. Rabiee
TL;DR
This work tackles class imbalance in transductive node classification on graphs by deriving a population-risk upper bound that highlights the minority class as the primary driver of error and by proposing Uncertainty-aware Pseudo-labeling (UPL). UPL combines uncertainty estimation and thresholded pseudo-labeling with a minority-focused augmentation strategy and a balanced Softmax loss to improve minority-class performance while preserving graph structure. Theoretical guarantees via transductive Rademacher complexity are complemented by extensive empirical results showing state-of-the-art performance across both homophilic and heterophilic graphs, with reduced performance variance. The approach offers a practical, scalable path to robust imbalanced graph learning and opens avenues for extensions to multi-class, inductive settings, and heterophily-aware designs.
Abstract
Graph-structured datasets often suffer from class imbalance, which complicates node classification tasks. In this work, we address this issue by first providing an upper bound on population risk for imbalanced transductive node classification. We then propose a simple and novel algorithm, Uncertainty-aware Pseudo-labeling (UPL). Our approach leverages pseudo-labels assigned to unlabeled nodes to mitigate the adverse effects of imbalance on classification accuracy. Furthermore, the UPL algorithm enhances the accuracy of pseudo-labeling by reducing training noise of pseudo-labels through a novel uncertainty-aware approach. We comprehensively evaluate the UPL algorithm across various benchmark datasets, demonstrating its superior performance compared to existing state-of-the-art methods.
