Table of Contents
Fetching ...

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

Liang Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng Huang

TL;DR

This work tackles imbalanced node classification on graphs by reinterpreting the problem through Bias-Variance Decomposition and linking imbalance to increased model variance. It proposes ReVar, a variance-regularized semi-supervised framework that uses graph augmentations to estimate variance and a class-center–based regularization to compensate minority classes, optimized via a two-view objective. The approach combines variance regularization with intra-class aggregation to produce robust representations, achieving state-of-the-art results on both publicly imbalanced and naturally imbalanced graph benchmarks. The work provides a theoretical lens and practical algorithmic tools that improve minority-class performance and offer a path toward principled design of GNNs under data imbalance.

Abstract

This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

TL;DR

This work tackles imbalanced node classification on graphs by reinterpreting the problem through Bias-Variance Decomposition and linking imbalance to increased model variance. It proposes ReVar, a variance-regularized semi-supervised framework that uses graph augmentations to estimate variance and a class-center–based regularization to compensate minority classes, optimized via a two-view objective. The approach combines variance regularization with intra-class aggregation to produce robust representations, achieving state-of-the-art results on both publicly imbalanced and naturally imbalanced graph benchmarks. The work provides a theoretical lens and practical algorithmic tools that improve minority-class performance and offer a path toward principled design of GNNs under data imbalance.

Abstract

This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.
Paper Structure (69 sections, 4 theorems, 25 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 69 sections, 4 theorems, 25 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Under the condition that $\sum_i n_i$ is a constant, the variance $\sum_{i=1}^c \mathbb{E}_x \left[\frac{1}{n_i} h^T(x) \Lambda^i h(x) \right]$ reach its minimum when all $n_i$ equal.

Figures (7)

  • Figure 1: We examine the alteration in variance concerning node classification as the imbalance ratio increases on and plot the regression curves for variance and imbalance ratio. We conduct this experiment using a fixed number of training set nodes but different ones, to mitigate the influence of the number of training set nodes on variance. Detailed experimental setup is in Appendix \ref{['More experiments for variance imbalance']}.
  • Figure 2: Overall pipeline of ReVar. (a) Two different views of the graph $\tilde{\mathbf{G}},\tilde{\mathbf{G}}^{\prime}$ are obtained by graph augmentation $transform$, and are subsequently fed into GNN encoder $f_{\theta}$. (b) Intra-class and inter-class representations are aggregated, which means, for labeled nodes, it's positive samples not only belong to the same class in both view but also in the other view. (c) Variance is estimated by Equation \ref{['new-reg-loss']}. Specifically, the label probability distribution is computed for each node in two views based on it's similarity with each class center. And the difference between two probability distributions is used to approximate the model's variance and also optimized as one term in the loss function.
  • Figure 3: Analysis of ReVar.
  • Figure 4: More Ablation Analysis for the Loss Function.
  • Figure 5: More experiments for variance and imbalance ratio correlation in Theorem \ref{['variance and imbalance theorem']}.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1
  • Theorem 1
  • Lemma 1
  • Theorem 1
  • Lemma 1