Class-Imbalanced Graph Learning without Class Rebalancing

Zhining Liu; Ruizhong Qiu; Zhichen Zeng; Hyunsik Yoo; David Zhou; Zhe Xu; Yada Zhu; Kommy Weldemariam; Jingrui He; Hanghang Tong

Class-Imbalanced Graph Learning without Class Rebalancing

Zhining Liu, Ruizhong Qiu, Zhichen Zeng, Hyunsik Yoo, David Zhou, Zhe Xu, Yada Zhu, Kommy Weldemariam, Jingrui He, Hanghang Tong

TL;DR

This work tackles class-imbalanced graph learning by revealing topological causes of minority bias: ambivalent message passing (AMP) and distant message passing (DMP). It introduces BAT, a lightweight, model-agnostic topological augmentation that identifies high-risk nodes through uncertainty and posterior likelihoods and expands their context with virtual class nodes, independent of class rebalancing. The authors provide theoretical results showing the minority class is more susceptible to AMP/DMP, with biases that grow with the imbalance ratio $\rho$, and demonstrate that BAT can markedly improve performance and reduce bias across diverse graphs and GNN backbones. Empirical results show BAT delivers consistent gains (up to 46.27\% in accuracy and up to 72.74\% in bias reduction) while maintaining efficiency, validating its practical utility as a complementary tool to CR techniques.

Abstract

Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models. Most existing studies are rooted in a class-rebalancing (CR) perspective and address class imbalance with class-wise reweighting or resampling. In this work, we approach the root cause of class-imbalance bias from an topological paradigm. Specifically, we theoretically reveal two fundamental phenomena in the graph topology that greatly exacerbate the predictive bias stemming from class imbalance. On this basis, we devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing. Being orthogonal to CR, BAT can function as an efficient plug-and-play module that can be seamlessly combined with and significantly boost existing CR techniques. Systematic experiments on real-world imbalanced graph learning tasks show that BAT can deliver up to 46.27% performance gain and up to 72.74% bias reduction over existing techniques. Code, examples, and documentations are available at https://github.com/ZhiningLiu1998/BAT.

Class-Imbalanced Graph Learning without Class Rebalancing

TL;DR

, and demonstrate that BAT can markedly improve performance and reduce bias across diverse graphs and GNN backbones. Empirical results show BAT delivers consistent gains (up to 46.27\% in accuracy and up to 72.74\% in bias reduction) while maintaining efficiency, validating its practical utility as a complementary tool to CR techniques.

Abstract

Paper Structure (26 sections, 2 theorems, 22 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 22 equations, 9 figures, 12 tables, 1 algorithm.

Introduction
Class Imbalance and Local Topology
Handling Class Imbalance from a Topological Perspective
Node Misclassification Risk Estimation
Posterior Likelihood Estimation
Virtual Topology Augmentation
Experiments
Related Works
Conclusion
Proofs of Theoretical Results
Limiting Distributions of $H^k_{ij}$
Proof of Theorem \ref{['the:amp']}
Proof of Theorem \ref{['the:dmp']}
Reproducibility
Data Statistics
...and 11 more sections

Key Result

Theorem 2.1

For a large $n$, the ratio of AMP coefficients $\alpha$ for the minority class to the majority class grows polynomially with the imbalance ratio $\rho$ and exponentially with $k$:

Figures (9)

Figure 1: Concepts of ambivalent message-passing (AMP) and distant message-passing (DMP) and their impact in real-world imbalanced node classification tasks park2021graphens. Both factors lead to a substantial increase in prediction errors, and further, a larger performance disparity/bias (i.e., the gap between the blue and orange curves) between the majority and minority classes.
Figure 2: Node-level distribution of AMP and DMP coefficients and their impact on learning.
Figure 3: The proposed Bat (BAlanced Topological augmentation) framework, best viewed in color.
Figure 4: The negative correlation between the estimated node risk (x-axis) and the prediction accuracy (y-axis). We apply 10 sliding windows to compute the mean and deviation of the accuracy.
Figure 5: The minority-class accuracy of model prediction $\hat{y}_v = F({\bm{A}}, {\bm{X}}; \Theta)$, and max-likelihood-based candidate selection $\hat{y}^s_v = {\mathop{\mathrm{arg\,max}}\limits(\hat{{\bm{s}}}_v)}$, on PubMed dataset. Note that this is just an illustrative example using ${\mathop{\mathrm{arg\,max}}\limits(\hat{{\bm{s}}}_v)}$. In practice, we consider the whole $\hat{{\bm{s}}}_v$ when sampling virtual edges, as described in Section \ref{['sec:virtual-node-edge']}.
...and 4 more figures

Theorems & Definitions (6)

Theorem 2.1: AMP-sourced bias
proof
Theorem 2.2: DMP-sourced bias
proof
proof
proof

Class-Imbalanced Graph Learning without Class Rebalancing

TL;DR

Abstract

Class-Imbalanced Graph Learning without Class Rebalancing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (6)