Table of Contents
Fetching ...

Evaluation of Multi- and Single-objective Learning Algorithms for Imbalanced Data

Szymon Wojciechowski, Michał Woźniak

TL;DR

This paper addresses the challenge of evaluating classifiers trained on imbalanced data when methods return either a single solution or a Pareto front of multiple solutions. It argues that aggregate metrics can be misleading and proposes a framework based on two new metrics, Strict Dominance Ratio ($SDR$) and Non-dominated Ratio ($NDR$), to compare multi-objective solutions against a reference single-solution model, supplemented by the $F_{\beta}$-plot to capture user preferences. Using a case study with MEUS across ten datasets and a reference pool of resampling methods, the authors demonstrate how these metrics illuminate diversity and dominance patterns, while also highlighting instability in some metrics and the practical value of visual interpretation for end users. The work advances classifier evaluation methodology by enabling fair comparisons between heterogeneous output forms and guiding the selection of solutions tailored to user needs. Overall, it provides a principled way to assess multi-objective learning in imbalanced settings, with implications for more reliable model selection and customization.

Abstract

Many machine learning tasks aim to find models that work well not for a single, but for a group of criteria, often opposing ones. One such example is imbalanced data classification, where, on the one hand, we want to achieve the best possible classification quality for data from the minority class without degrading the classification quality of the majority class. One solution is to propose an aggregate learning criterion and reduce the multi-objective learning task to a single-criteria optimization problem. Unfortunately, such an approach is characterized by ambiguity of interpretation since the value of the aggregated criterion does not indicate the value of the component criteria. Hence, there are more and more proposals for algorithms based on multi-objective optimization (MOO), which can simultaneously optimize multiple criteria. However, such an approach results in a set of multiple non-dominated solutions (Pareto front). The selection of a single solution from the Pareto front is a challenge itself, and much attention is paid to the issue of how to select it considering user preferences, as well as how to compare solutions returned by different MOO algorithms among themselves. Thus, a significant gap has been identified in the classifier evaluation methodology, i.e., how to reliably compare methods returning single solutions with algorithms returning solutions in the form of Pareto fronts. To fill the aforementioned gap, this article proposes a new, reliable way of evaluating algorithms based on multi-objective algorithms with methods that return single solutions while pointing out solutions from a Pareto front tailored to the user's preferences. This work focuses only on algorithm comparison, not their learning. The algorithms selected for this study are illustrative to help understand the proposed approach.

Evaluation of Multi- and Single-objective Learning Algorithms for Imbalanced Data

TL;DR

This paper addresses the challenge of evaluating classifiers trained on imbalanced data when methods return either a single solution or a Pareto front of multiple solutions. It argues that aggregate metrics can be misleading and proposes a framework based on two new metrics, Strict Dominance Ratio () and Non-dominated Ratio (), to compare multi-objective solutions against a reference single-solution model, supplemented by the -plot to capture user preferences. Using a case study with MEUS across ten datasets and a reference pool of resampling methods, the authors demonstrate how these metrics illuminate diversity and dominance patterns, while also highlighting instability in some metrics and the practical value of visual interpretation for end users. The work advances classifier evaluation methodology by enabling fair comparisons between heterogeneous output forms and guiding the selection of solutions tailored to user needs. Overall, it provides a principled way to assess multi-objective learning in imbalanced settings, with implications for more reliable model selection and customization.

Abstract

Many machine learning tasks aim to find models that work well not for a single, but for a group of criteria, often opposing ones. One such example is imbalanced data classification, where, on the one hand, we want to achieve the best possible classification quality for data from the minority class without degrading the classification quality of the majority class. One solution is to propose an aggregate learning criterion and reduce the multi-objective learning task to a single-criteria optimization problem. Unfortunately, such an approach is characterized by ambiguity of interpretation since the value of the aggregated criterion does not indicate the value of the component criteria. Hence, there are more and more proposals for algorithms based on multi-objective optimization (MOO), which can simultaneously optimize multiple criteria. However, such an approach results in a set of multiple non-dominated solutions (Pareto front). The selection of a single solution from the Pareto front is a challenge itself, and much attention is paid to the issue of how to select it considering user preferences, as well as how to compare solutions returned by different MOO algorithms among themselves. Thus, a significant gap has been identified in the classifier evaluation methodology, i.e., how to reliably compare methods returning single solutions with algorithms returning solutions in the form of Pareto fronts. To fill the aforementioned gap, this article proposes a new, reliable way of evaluating algorithms based on multi-objective algorithms with methods that return single solutions while pointing out solutions from a Pareto front tailored to the user's preferences. This work focuses only on algorithm comparison, not their learning. The algorithms selected for this study are illustrative to help understand the proposed approach.

Paper Structure

This paper contains 11 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: G-mean and $F_1$ curves in the recall-precision space.
  • Figure 3: Comparison of graphical representation of metrics.
  • Figure 4: $F_{\beta}$-plot results.