Impossibility results for equating the Youden Index with average scoring rules and Tjur $R^2$-like metrics
Linard Hoessly
TL;DR
This work investigates whether the Youden index, a classic diagnostic accuracy measure, can be represented as either the average of a real-valued scoring rule over predicted probabilities or as a Tjur $R^2$-like metric derived from probabilistic predictions. By formalizing the data setup with a binary outcome model, a $2\times2$ contingency framework, and general scoring rules $S$, the authors establish impossibility results: no continuous scoring rule $S$ yields $Youden=Av_S$ or $Youden=Ev_S$ for all feasible contingency configurations. The proofs, conducted via contradiction and case analysis on $(a,b,c,d)$, reveal fundamental obstructions to such equivalences, underscoring the distinct roles of these metrics in diagnostic assessment. The findings invite further exploration of alternative links between classification evaluation and probabilistic prediction measures, suggesting that new or different metrics may be required to bridge these perspectives.
Abstract
We consider the Youden index fas well as measures evaluating predicted probabilities for the maximum-likelihood estimate of a logistic regression model with predictor the classifier. We give impossibility results showing that the Youden index can not equal any average of a real scoring rule nor any metric averaging over binary outcomes (0s and 1s) for any continuous real-valued scoring rule. This shows the obstructions of such potential equivalences and highlights the distinct roles these metrics play in diagnostic assessment.
