The Tile: A 2D Map of Ranking Scores for Two-Class Classification
Sébastien Piérard, Anaïs Halin, Anthony Cioppa, Adrien Deliège, Marc Van Droogenbroeck
TL;DR
The Tile addresses the challenge of ranking two-class classifiers across diverse, application-specific preferences by organizing an infinite family of ranking scores into a two-dimensional map. It builds canonical ranking scores via $\rankingScore[I_{a,b}] = \frac{(1-a)PTN + aPTP}{(1-a)PTN + (1-b)PFP + bPFN + aPTP}$ and shows how familiar metrics like $A$, $TPR$, $TNR$, $PPV$, $NPV$, and $\scoreFBeta$ are instances, enabling unified interpretation through iso-performance lines in ROC space. The Tile supports reading, comparing, and ranking classifiers, analyzes the impact of priors, investigates no-skill performances with curves $\gamma_\pi$ and $\gamma_\tau$, and links to existing evaluation spaces while revealing the geometry of score-induced orderings. This framework provides a practical and rigorous tool for application-aware model selection, robustness assessment, and deeper understanding of ranking properties beyond traditional single-score or two-score plots. The approach has potential to influence benchmarking, model comparison, and the design of evaluation metrics by emphasizing continuous, prior-aware rankings across a visually intuitive surface.
Abstract
In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.
