Extending F1 metric, probabilistic approach

Mikolaj Sitarz

Extending F1 metric, probabilistic approach

Mikolaj Sitarz

TL;DR

This work introduces $P_4$, a probabilistic extension of the F_1 metric for binary classification, defined as the harmonic mean of four conditional probabilities $P(+|C+)$, $P(C+|+)$, $P(C-| -)$, and $P(-|C-)$, ensuring $P_4 \in [0,1]$ and symmetry under label swapping via $P_4 = \frac{4}{\frac{1}{PREC}+\frac{1}{REC}+\frac{1}{SPEC}+\frac{1}{NPV}} = \frac{4\,TP\,TN}{4\,TP\,TN+(TP+TN)(FP+FN)}$. The authors demonstrate that $P_4$ zeros out if any conditional probability tends to zero and approaches one only when all four converge to one, offering a probabilistic, labeling-symmetric alternative to $F_1$. Through edge-case analyses, simulations, and a real-world Breast Cancer Wisconsin dataset study, $P_4$ is compared with MCC, $F_1$, Youden’s J, and MK, showing that $P_4$ often parallels MCC in practical scenarios while providing a more interpretable probabilistic basis and a clear zeroing condition. The work also presents ROC-like curves using $P_4$ alongside $MCC$-based analyses, illustrating threshold choices and practical applicability in imbalanced settings. Overall, $P_4$ broadens the toolkit for binary classifier evaluation, with potential for weighted extensions to emphasize clinically or domain-specific priorities.

Abstract

This article explores the extension of well-known F1 score used for assessing the performance of binary classifiers. We propose the new metric using probabilistic interpretation of precision, recall, specificity, and negative predictive value. We describe its properties and compare it to common metrics. Then we demonstrate its behavior in edge cases of the confusion matrix. Finally, the properties of the metric are tested on binary classifier trained on the real dataset.

Extending F1 metric, probabilistic approach

TL;DR

This work introduces

, a probabilistic extension of the F_1 metric for binary classification, defined as the harmonic mean of four conditional probabilities

, and

, ensuring

and symmetry under label swapping via

. The authors demonstrate that

zeros out if any conditional probability tends to zero and approaches one only when all four converge to one, offering a probabilistic, labeling-symmetric alternative to

. Through edge-case analyses, simulations, and a real-world Breast Cancer Wisconsin dataset study,

is compared with MCC,

, Youden’s J, and MK, showing that

often parallels MCC in practical scenarios while providing a more interpretable probabilistic basis and a clear zeroing condition. The work also presents ROC-like curves using

alongside

-based analyses, illustrating threshold choices and practical applicability in imbalanced settings. Overall,

broadens the toolkit for binary classifier evaluation, with potential for weighted extensions to emphasize clinically or domain-specific priorities.

Abstract

Paper Structure (22 sections, 22 equations)

This paper contains 22 sections, 22 equations.

Background
Common metrics
Basic and composite metrics
Matthews correlation coefficient
Probabilistic approach -- focusing on conditional probabilities
Edge cases
Confusion matrix
Case 1 - "alarming precision"
Case 2 - "alarming negative predictive value"
Case 3 - "alarming recall"
Case 4 - "alarming specificity"
Summary
$\mathrm{P}_4$ compared to other metrics
Metrics vs population balance
Metrics vs true positive rate
...and 7 more sections

Extending F1 metric, probabilistic approach

TL;DR

Abstract

Extending F1 metric, probabilistic approach

Authors

TL;DR

Abstract

Table of Contents