Table of Contents
Fetching ...

Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment

Gissel Velarde, Michael Weichert, Anuj Deshmunkh, Sanjay Deshmane, Anindya Sudhir, Khushboo Sharma, Vaibhav Joshi

TL;DR

The study investigates how tree-boosting methods, centered on XGBoost, perform on balanced and highly imbalanced binary classification tasks pertinent to risk assessment. It compares Vanilla XGBoost with Random-Search-tuned variants and scale_pos_weight optimization, across private datasets of sizes 1K, 10K, and 100K and four positive-class proportions. Key findings show performance gains with larger and more balanced data, yet F1 declines with increasing imbalance; balancing samples via resampling offers no consistent benefit, while hyper-parameter tuning—especially scale_pos_weight—provides meaningful improvements, particularly for smaller datasets. The approach demonstrates robustness to time-based data variation up to a point, with retraining recommended when performance drifts, guiding practical deployment of scalable, accurate risk-scoring systems.

Abstract

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult to be detected. This paper empirically evaluates tree boosting methods' performance given different dataset sizes and class distributions, from perfectly balanced to highly imbalanced. For tabular data, tree-based methods such as XGBoost, stand out in several benchmarks due to detection performance and speed. Therefore, XGBoost and Imbalance-XGBoost are evaluated. After introducing the motivation to address risk assessment with machine learning, the paper reviews evaluation metrics for detection systems or binary classifiers. It proposes a method for data preparation followed by tree boosting methods including hyper-parameter optimization. The method is evaluated on private datasets of 1 thousand (K), 10K and 100K samples on distributions with 50, 45, 25, and 5 percent positive samples. As expected, the developed method increases its recognition performance as more data is given for training and the F1 score decreases as the data distribution becomes more imbalanced, but it is still significantly superior to the baseline of precision-recall determined by the ratio of positives divided by positives and negatives. Sampling to balance the training set does not provide consistent improvement and deteriorates detection. In contrast, classifier hyper-parameter optimization improves recognition, but should be applied carefully depending on data volume and distribution. Finally, the developed method is robust to data variation over time up to some point. Retraining can be used when performance starts deteriorating.

Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment

TL;DR

The study investigates how tree-boosting methods, centered on XGBoost, perform on balanced and highly imbalanced binary classification tasks pertinent to risk assessment. It compares Vanilla XGBoost with Random-Search-tuned variants and scale_pos_weight optimization, across private datasets of sizes 1K, 10K, and 100K and four positive-class proportions. Key findings show performance gains with larger and more balanced data, yet F1 declines with increasing imbalance; balancing samples via resampling offers no consistent benefit, while hyper-parameter tuning—especially scale_pos_weight—provides meaningful improvements, particularly for smaller datasets. The approach demonstrates robustness to time-based data variation up to a point, with retraining recommended when performance drifts, guiding practical deployment of scalable, accurate risk-scoring systems.

Abstract

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult to be detected. This paper empirically evaluates tree boosting methods' performance given different dataset sizes and class distributions, from perfectly balanced to highly imbalanced. For tabular data, tree-based methods such as XGBoost, stand out in several benchmarks due to detection performance and speed. Therefore, XGBoost and Imbalance-XGBoost are evaluated. After introducing the motivation to address risk assessment with machine learning, the paper reviews evaluation metrics for detection systems or binary classifiers. It proposes a method for data preparation followed by tree boosting methods including hyper-parameter optimization. The method is evaluated on private datasets of 1 thousand (K), 10K and 100K samples on distributions with 50, 45, 25, and 5 percent positive samples. As expected, the developed method increases its recognition performance as more data is given for training and the F1 score decreases as the data distribution becomes more imbalanced, but it is still significantly superior to the baseline of precision-recall determined by the ratio of positives divided by positives and negatives. Sampling to balance the training set does not provide consistent improvement and deteriorates detection. In contrast, classifier hyper-parameter optimization improves recognition, but should be applied carefully depending on data volume and distribution. Finally, the developed method is robust to data variation over time up to some point. Retraining can be used when performance starts deteriorating.

Paper Structure

This paper contains 21 sections, 5 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: AI can simulate human decisions in a shorter time, helping human inspectors save time and focus on critical cases. From Scaling_Velarde.
  • Figure 2: Consider that each decision takes a human expert 5 minutes. Therefore, for 1000 requests, the time to execute will be 5000 minutes, 10000 minutes for 2000 requests, and so on. AI Machines execute each request in milliseconds and, therefore, can help save time, increasing productivity. In addition, AI can easily scale as the number of requests increases. From Scaling_Velarde.
  • Figure 3: Examples of possible distributions for balanced and imbalanced datasets.
  • Figure 4: Example of a tree ensemble model with two trees. Decision nodes are oval and leaf nodes are rectangular. The numbers inside leaf nodes are scores that contribute to the final prediction. For instance, given an $example$ where $x_1>A$ and $x_3>B$, the final prediction is equal to -1.1 + 1 = -0.1. A convex loss function is used to compare the final prediction with the target to learn the set of functions, minimizing a regularized objective chen2016xgboost.
  • Figure 5: Illustration of the datasets' size.
  • ...and 9 more figures