Table of Contents
Fetching ...

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

Zhining Liu, Zihao Li, Ze Yang, Tianxin Wei, Jian Kang, Yada Zhu, Hendrik Hamann, Jingrui He, Hanghang Tong

TL;DR

CLIMB tackles the real-world problem of class-imbalanced learning on tabular data by providing a comprehensive benchmark with 73 open-world datasets and 29 representative CIL algorithms implemented in a unified API. The framework employs a principled protocol with standardized preprocessing, 5-fold stratified splits, 100 hyperparameter trials per method, and evaluation via $AUPRC$, macro-$F_1$, and $BAC$, enabling fair, large-scale comparisons. Key findings include that naive balancing often hurts performance, ensemble approaches, especially undersample ensembles, yield robust gains, and data quality factors such as label noise and missing values can have a larger impact than the imbalance itself. The study also highlights the importance of metric choice for interpretation, analyzes runtime trade-offs, and demonstrates that data quality and safe data cleaning can be as or more critical than balancing, offering practical guidance for deploying CIL methods in industry. The open-source CLIMB package, extensive datasets, and empirical insights are positioned to guide future research and real-world applications in imbalanced tabular tasks.

Abstract

Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https://github.com/ZhiningLiu1998/imbalanced-ensemble.

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

TL;DR

CLIMB tackles the real-world problem of class-imbalanced learning on tabular data by providing a comprehensive benchmark with 73 open-world datasets and 29 representative CIL algorithms implemented in a unified API. The framework employs a principled protocol with standardized preprocessing, 5-fold stratified splits, 100 hyperparameter trials per method, and evaluation via , macro-, and , enabling fair, large-scale comparisons. Key findings include that naive balancing often hurts performance, ensemble approaches, especially undersample ensembles, yield robust gains, and data quality factors such as label noise and missing values can have a larger impact than the imbalance itself. The study also highlights the importance of metric choice for interpretation, analyzes runtime trade-offs, and demonstrates that data quality and safe data cleaning can be as or more critical than balancing, offering practical guidance for deploying CIL methods in industry. The open-source CLIMB package, extensive datasets, and empirical insights are positioned to guide future research and real-world applications in imbalanced tabular tasks.

Abstract

Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https://github.com/ZhiningLiu1998/imbalanced-ensemble.

Paper Structure

This paper contains 48 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Overview of the proposed Climb benchmark. Best viewed in color.
  • Figure 2: Statistics summary of the imbalanced tabular datasets included in Climb.
  • Figure 3: Performance versus runtime analysis, following the dataset grouping in Table \ref{['tab:main']}. The x-axis shows the average runtime of each CIL algorithm, and the y-axis shows the average AUPRC. Desired methods are closer to the upper-left corner with high accuracy and low computational cost. Different markers indicate different CIL method categories, the dashed line denotes the base model (no balancing) performance and runtime. More results with other metrics are in Appendix \ref{['sec:ap-fullres']}.
  • Figure 4: Robustness analysis. Each row corresponds to the noise, missing values, and additional class imbalance setting (from top to bottom). Each column represents a branch of CIL methods.
  • Figure 5: Pair-wise win ratio (by AUPRC) comparison between all CIL algorithms. The number represents the ratio of datasets that the row method outperforms the column method on, i.e., a blue/red row means the row method consistently outperforms/underperforms others.
  • ...and 2 more figures