Foundations of the Theory of Performance-Based Ranking
Sébastien Piérard, Anaïs Halin, Anthony Cioppa, Adrien Deliège, Marc Van Droogenbroeck
TL;DR
The paper addresses the lack of theoretical grounding in performance-based ranking by introducing a universal six-element mathematical framework that fuses probability and order theory. It defines performance as a probability measure, models tasks via a satisfaction variable, and captures evaluation and application-specific preferences through a coherent system of scores and the ranking scores rankingScore. The authors present three axioms and three sufficient-condition theorems, establishing that the ranking scores induce orderings satisfying these axioms and are applicable to any task, including two-class classification where they recover classic metrics like accuracy, recall, and precision while highlighting unsuitable alternatives. This framework provides a principled basis for sound, preference-aware rankings, with practical implications for designing challenges and selecting appropriate metrics. The work thereby advances a universal, theory-grounded approach to ranking classifiers, detectors, and related entities, supporting diverse applications and avoiding misleading or unstable rankings.
Abstract
Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal theory for performance-based ranking. First, we introduce a rigorous framework built on top of both the probability and order theories. Our new framework encompasses the elements necessary to (1) manipulate performances as mathematical objects, (2) express which performances are worse than or equivalent to others, (3) model tasks through a variable called satisfaction, (4) consider properties of the evaluation, (5) define scores, and (6) specify application-specific preferences through a variable called importance. On top of this framework, we propose the first axiomatic definition of performance orderings and performance-based rankings. Then, we introduce a universal parametric family of scores, called ranking scores, that can be used to establish rankings satisfying our axioms, while considering application-specific preferences. Finally, we show, in the case of two-class classification, that the family of ranking scores encompasses well-known performance scores, including the accuracy, the true positive rate (recall, sensitivity), the true negative rate (specificity), the positive predictive value (precision), and F1. However, we also show that some other scores commonly used to compare classifiers are unsuitable to derive performance orderings satisfying the axioms.
