Extending F1 metric, probabilistic approach
Mikolaj Sitarz
TL;DR
This work introduces $P_4$, a probabilistic extension of the F_1 metric for binary classification, defined as the harmonic mean of four conditional probabilities $P(+|C+)$, $P(C+|+)$, $P(C-| -)$, and $P(-|C-)$, ensuring $P_4 \in [0,1]$ and symmetry under label swapping via $P_4 = \frac{4}{\frac{1}{PREC}+\frac{1}{REC}+\frac{1}{SPEC}+\frac{1}{NPV}} = \frac{4\,TP\,TN}{4\,TP\,TN+(TP+TN)(FP+FN)}$. The authors demonstrate that $P_4$ zeros out if any conditional probability tends to zero and approaches one only when all four converge to one, offering a probabilistic, labeling-symmetric alternative to $F_1$. Through edge-case analyses, simulations, and a real-world Breast Cancer Wisconsin dataset study, $P_4$ is compared with MCC, $F_1$, Youden’s J, and MK, showing that $P_4$ often parallels MCC in practical scenarios while providing a more interpretable probabilistic basis and a clear zeroing condition. The work also presents ROC-like curves using $P_4$ alongside $MCC$-based analyses, illustrating threshold choices and practical applicability in imbalanced settings. Overall, $P_4$ broadens the toolkit for binary classifier evaluation, with potential for weighted extensions to emphasize clinically or domain-specific priorities.
Abstract
This article explores the extension of well-known F1 score used for assessing the performance of binary classifiers. We propose the new metric using probabilistic interpretation of precision, recall, specificity, and negative predictive value. We describe its properties and compare it to common metrics. Then we demonstrate its behavior in edge cases of the confusion matrix. Finally, the properties of the metric are tested on binary classifier trained on the real dataset.
