Table of Contents
Fetching ...

$F_β$-plot -- a visual tool for evaluating imbalanced data classifiers

Szymon Wojciechowski, Michał Woźniak

TL;DR

This paper tackles the problem of evaluating imbalanced data classifiers when the loss function is not known by proposing the $F_{\beta}$-plot, a visualization that tracks $F_{\beta}$ across a range of $\beta$ values to reveal how classifier rankings change with different precision-recall trade-offs. By connecting $\beta$ to the relative importance of $PPV$ and $TPR$, the method helps end-users select models aligned with their cost preferences rather than relying on a single aggregated metric. The authors demonstrate the approach on multiple datasets, showing that no single model dominates across all $\beta$ and that the plots can guide method choice under varying requirements, albeit with limitations such as neglecting $TN$ and potential metric ambiguity. Overall, the $F_{\beta}$-plot provides a practical, end-user–driven framework for fair, multi-criteria evaluation of imbalanced classifiers, with code available for reproduction.

Abstract

One of the significant problems associated with imbalanced data classification is the lack of reliable metrics. This runs primarily from the fact that for most real-life (as well as commonly used benchmark) problems, we do not have information from the user on the actual form of the loss function that should be minimized. Although it is pretty common to have metrics indicating the classification quality within each class, for the end user, the analysis of several such metrics is then required, which in practice causes difficulty in interpreting the usefulness of a given classifier. Hence, many aggregate metrics have been proposed or adopted for the imbalanced data classification problem, but there is still no consensus on which should be used. An additional disadvantage is their ambiguity and systematic bias toward one class. Moreover, their use in analyzing experimental results in recognition of those classification models that perform well for the chosen aggregated metrics is burdened with the drawbacks mentioned above. Hence, the paper proposes a simple approach to analyzing the popular parametric metric $F_β$. We point out that it is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.

$F_β$-plot -- a visual tool for evaluating imbalanced data classifiers

TL;DR

This paper tackles the problem of evaluating imbalanced data classifiers when the loss function is not known by proposing the -plot, a visualization that tracks across a range of values to reveal how classifier rankings change with different precision-recall trade-offs. By connecting to the relative importance of and , the method helps end-users select models aligned with their cost preferences rather than relying on a single aggregated metric. The authors demonstrate the approach on multiple datasets, showing that no single model dominates across all and that the plots can guide method choice under varying requirements, albeit with limitations such as neglecting and potential metric ambiguity. Overall, the -plot provides a practical, end-user–driven framework for fair, multi-criteria evaluation of imbalanced classifiers, with code available for reproduction.

Abstract

One of the significant problems associated with imbalanced data classification is the lack of reliable metrics. This runs primarily from the fact that for most real-life (as well as commonly used benchmark) problems, we do not have information from the user on the actual form of the loss function that should be minimized. Although it is pretty common to have metrics indicating the classification quality within each class, for the end user, the analysis of several such metrics is then required, which in practice causes difficulty in interpreting the usefulness of a given classifier. Hence, many aggregate metrics have been proposed or adopted for the imbalanced data classification problem, but there is still no consensus on which should be used. An additional disadvantage is their ambiguity and systematic bias toward one class. Moreover, their use in analyzing experimental results in recognition of those classification models that perform well for the chosen aggregated metrics is burdened with the drawbacks mentioned above. Hence, the paper proposes a simple approach to analyzing the popular parametric metric . We point out that it is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
Paper Structure (6 sections, 5 equations, 5 figures, 2 tables)

This paper contains 6 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Example of relations between $F_\beta$ and $\beta$
  • Figure 2: $F_\beta$-plot for Thyroid Disease with hold-out evaluation.
  • Figure 3: $F_\beta$-plot for Thyroid Disease with cross-validation.
  • Figure 4: $F_\beta$-plots of selected datasets
  • Figure 5: Example of different $F_1$ values in PPV-TPR space.