Table of Contents
Fetching ...

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyujin Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

TL;DR

IGL-Bench tackles the lack of standardized evaluation in imbalanced graph learning by introducing an open-source, unified benchmarking suite that measures 24 IGL algorithms across 17 real-world datasets for both node- and graph-level tasks under class- and topology-imbalance. The framework emphasizes consistency in data processing, splits, and evaluation metrics to enable fair comparisons of effectiveness, robustness, and efficiency, and it provides a reusable package to facilitate reproducible research. Across extensive experiments, the study reveals nuanced gains for node-level methods, highlights the persistent difficulty of topology-imbalance, and shows notable variability in graph-level results with some methods struggling on large-scale data. The findings underscore the need for unified, scalable IGL approaches and pave the way for robust, high-impact methods and broad adoption of reproducible benchmarking in imbalanced graph learning.

Abstract

Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to biased outcomes. To address this challenge, Imbalanced Graph Learning (IGL) has garnered substantial attention, enabling more balanced data distributions and better task performance. Despite the proliferation of IGL algorithms, the absence of consistent experimental protocols and fair performance comparisons pose a significant barrier to comprehending advancements in this field. To bridge this gap, we introduce IGL-Bench, a foundational comprehensive benchmark for imbalanced graph learning, embarking on 16 diverse graph datasets and 24 distinct IGL algorithms with uniform data processing and splitting strategies. Specifically, IGL-Bench systematically investigates state-of-the-art IGL algorithms in terms of effectiveness, robustness, and efficiency on node-level and graph-level tasks, with the scope of class-imbalance and topology-imbalance. Extensive experiments demonstrate the potential benefits of IGL algorithms on various imbalanced conditions, offering insights and opportunities in the IGL field. Further, we have developed an open-sourced and unified package to facilitate reproducible evaluation and inspire further innovative research, which is available at https://github.com/RingBDStack/IGL-Bench.

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

TL;DR

IGL-Bench tackles the lack of standardized evaluation in imbalanced graph learning by introducing an open-source, unified benchmarking suite that measures 24 IGL algorithms across 17 real-world datasets for both node- and graph-level tasks under class- and topology-imbalance. The framework emphasizes consistency in data processing, splits, and evaluation metrics to enable fair comparisons of effectiveness, robustness, and efficiency, and it provides a reusable package to facilitate reproducible research. Across extensive experiments, the study reveals nuanced gains for node-level methods, highlights the persistent difficulty of topology-imbalance, and shows notable variability in graph-level results with some methods struggling on large-scale data. The findings underscore the need for unified, scalable IGL approaches and pave the way for robust, high-impact methods and broad adoption of reproducible benchmarking in imbalanced graph learning.

Abstract

Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to biased outcomes. To address this challenge, Imbalanced Graph Learning (IGL) has garnered substantial attention, enabling more balanced data distributions and better task performance. Despite the proliferation of IGL algorithms, the absence of consistent experimental protocols and fair performance comparisons pose a significant barrier to comprehending advancements in this field. To bridge this gap, we introduce IGL-Bench, a foundational comprehensive benchmark for imbalanced graph learning, embarking on 16 diverse graph datasets and 24 distinct IGL algorithms with uniform data processing and splitting strategies. Specifically, IGL-Bench systematically investigates state-of-the-art IGL algorithms in terms of effectiveness, robustness, and efficiency on node-level and graph-level tasks, with the scope of class-imbalance and topology-imbalance. Extensive experiments demonstrate the potential benefits of IGL algorithms on various imbalanced conditions, offering insights and opportunities in the IGL field. Further, we have developed an open-sourced and unified package to facilitate reproducible evaluation and inspire further innovative research, which is available at https://github.com/RingBDStack/IGL-Bench.
Paper Structure (68 sections, 4 equations, 30 figures, 47 tables)

This paper contains 68 sections, 4 equations, 30 figures, 47 tables.

Figures (30)

  • Figure 1: Overview of the established IGL-Bench. Both IGL algorithms and datasets are categorized into node-level and graph-level, where the algorithms are further divided into class-imbalance, topology-imbalance, or both. Click $\rhd$ and link to the corresponding sections for in-depth analysis.
  • Figure 2: The research scope of the proposed IGL-Bench. Definitions of the imbalance ratio ($\rho$) corresponding to each imbalance issue are further concluded in Table \ref{['tab:imb_definition']}. Click $\rhd$ and check details.
  • Figure 3: Robustness analysis of node-level algorithms under different class-imbalance levels on Cora (homophilic). Results are Accuracy and its relative decrease compared to the balanced split.
  • Figure 4: Robustness analysis of the node-level and graph-level algorithms under different imbalance levels. Results are reported with the algorithm performance (Accuracy) with the standard deviation.
  • Figure 5: Visualization of node- and graph-level IGL algorithms in varying imbalanced scenarios.
  • ...and 25 more figures

Theorems & Definitions (2)

  • Definition 1: Class-Imbalance
  • Definition 2: Topology-Imbalance