The Certainty Ratio $C_ρ$: a novel metric for assessing the reliability of classifier predictions
Jesus S. Aguilar-Ruiz
TL;DR
This paper addresses the shortcoming of traditional classifier evaluation metrics that ignore prediction uncertainty in high-stakes settings. It introduces the Probabilistic Confusion Matrix $CM^\star$ built from classifier probability outputs $Q$, and a decomposition into Certainty $V$ and Uncertainty $U$ matrices, enabling the Certainty Ratio $\\mathcal{C_\\rho}$ to quantify the share of performance derived from certain predictions. The framework generalizes standard measures to a probabilistic setting (e.g., $Acc^\star$, $Acc^\star_v$, $Acc^\star_u$) and includes a divergence metric $d(CM,CM^\star)$ for assessing discrepancies between discrete and probabilistic views. Experimental analysis on 21 datasets across four classifiers shows that high traditional accuracy can be driven by uncertain predictions and that $\\mathcal{C_\\rho}$ provides a more nuanced view of reliability, with Decision Trees often achieving high certainty and Random Forests exposing higher uncertainty. Overall, the Certainty Ratio offers a universal, interpretable tool to improve model trustworthiness and guide reliability-focused model selection and deployment, potentially integrating with calibration and explainability methods for real-time decision support.
Abstract
Evaluating the performance of classifiers is critical in machine learning, particularly in high-stakes applications where the reliability of predictions can significantly impact decision-making. Traditional performance measures, such as accuracy and F-score, often fail to account for the uncertainty inherent in classifier predictions, leading to potentially misleading assessments. This paper introduces the Certainty Ratio ($C_ρ$), a novel metric designed to quantify the contribution of confident (certain) versus uncertain predictions to any classification performance measure. By integrating the Probabilistic Confusion Matrix ($CM^\star$) and decomposing predictions into certainty and uncertainty components, $C_ρ$ provides a more comprehensive evaluation of classifier reliability. Experimental results across 21 datasets and multiple classifiers, including Decision Trees, Naive-Bayes, 3-Nearest Neighbors, and Random Forests, demonstrate that $C_ρ$ reveals critical insights that conventional metrics often overlook. These findings emphasize the importance of incorporating probabilistic information into classifier evaluation, offering a robust tool for researchers and practitioners seeking to improve model trustworthiness in complex environments.
