Table of Contents
Fetching ...

FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

Antonio Guillén-Teruel, Marcos Caracena, Jose A. Pardo, Fernando de-la-Gándara, José Palma, Juan A. Botía

TL;DR

This work tackles the pervasive bias of standard performance metrics on imbalanced binary classification tasks by introducing the Unbiased Integration Coefficients (UIC), a bias-resistant metric that weights traditional measures according to their correlation with the minority proportion $p_{min}$. It jointly proposes IPIP, an ensemble method that builds balanced resamples without synthetic data to improve minority-class coverage and predictive stability, evaluated across seven datasets with logistic regression and random forest as bases. Empirical results show UIC reduces $p_{min}$-related bias ($p<10^{-4}$) and that IPIP achieves top UIC scores on three datasets, with performance tied to dataset dimensionality. The FILM R package operationalizes these approaches, offering a practical tool for robust model selection and imbalanced learning in real-world settings.

Abstract

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias ($p < 10^{-4}$) towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.

FILM: Framework for Imbalanced Learning Machines based on a new unbiased performance measure and a new ensemble-based technique

TL;DR

This work tackles the pervasive bias of standard performance metrics on imbalanced binary classification tasks by introducing the Unbiased Integration Coefficients (UIC), a bias-resistant metric that weights traditional measures according to their correlation with the minority proportion . It jointly proposes IPIP, an ensemble method that builds balanced resamples without synthetic data to improve minority-class coverage and predictive stability, evaluated across seven datasets with logistic regression and random forest as bases. Empirical results show UIC reduces -related bias () and that IPIP achieves top UIC scores on three datasets, with performance tied to dataset dimensionality. The FILM R package operationalizes these approaches, offering a practical tool for robust model selection and imbalanced learning in real-world settings.

Abstract

This research addresses the challenges of handling unbalanced datasets for binary classification tasks. In such scenarios, standard evaluation metrics are often biased by the disproportionate representation of the minority class. Conducting experiments across seven datasets, we uncovered inconsistencies in evaluation metrics when determining the model that outperforms others for each binary classification problem. This justifies the need for a metric that provides a more consistent and unbiased evaluation across unbalanced datasets, thereby supporting robust model selection. To mitigate this problem, we propose a novel metric, the Unbiased Integration Coefficients (UIC), which exhibits significantly reduced bias () towards the minority class compared to conventional metrics. The UIC is constructed by aggregating existing metrics while penalising those more prone to imbalance. In addition, we introduce the Identical Partitions for Imbalance Problems (IPIP) algorithm for imbalanced ML problems, an ensemble-based approach. Our experimental results show that IPIP outperforms other baseline imbalance-aware approaches using Random Forest and Logistic Regression models in three out of seven datasets as assessed by the UIC metric, demonstrating its effectiveness in addressing imbalanced data challenges in binary classification tasks. This new framework for dealing with imbalanced datasets is materialized in the FILM (Framework for Imbalanced Learning Machines) R Package, accessible at https://github.com/antoniogt/FILM.

Paper Structure

This paper contains 18 sections, 11 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Values of different metrics training Random Forest with upsample for SMS dataset.
  • Figure 2: Taxonomy of the imbalance learning techniques studied in this section.
  • Figure 3: Three different Gaussian functions. Red curve with parameters $(1,0,0.35)$, blue curve with parameters $(1,0,0.25)$ and green curve with parameters $(1,0,0.15)$.
  • Figure 4: FILM methodology. First, $n$ datasets are obtained from the original dataset $d$ through sampling with several imbalance ratios. A set of algorithms is then trained on $d$ and its resamples, resulting in $k$ metrics for each algorithm on each dataset. Thus, a three-dimensional matrix is formed to retain this values and then we aggregate the metric results for each algorithm across all $n+1$ datasets and retain the $k$ metric values of all algorithms in $d$. This provides weights for each metric and algorithm. The weights and values of the metrics are finally aggregated over the original dataset to obtain the UIC metric. This allows us to determine which algorithm has obtained the maximum value.
  • Figure 5: A) Concordance plot of the SMS dataset using Random Forest. B) Concordance plot of the Phoneme dataset using Logistic Regression models.
  • ...and 2 more figures