Table of Contents
Fetching ...

On Rank Graduation Metrics for High Dimensional Ordinal Data

Gennaro Auricchio, Adelaide Emma Bernardelli, Paolo Giudici, Giuseppe Toscani

TL;DR

The paper addresses evaluating reliability for ordinal targets by introducing RGX_p metrics, a unifying, rank-based framework that quantifies the portion of variability explained by predictions. It develops a solid theoretical bridge between RGX_p, CvM divergences, and Lorenz/Gini concepts, and extends to multivariate settings via a whitening approach. Through extensive experiments on ESG scores with linear and neural models, it demonstrates improved accuracy, robustness, and explainability under the RGX_p framework, supported by Shapley-based feature attribution and Spearman rankings. The work provides a principled pathway for trustworthy, SAFE AI in domains with ordinal outcomes and complex multivariate structure.

Abstract

Evaluating the reliability of machine learning classifications remains a fundamental challenge in Artificial Intelligence (AI), particularly when the target variable is multidimensional. Classification variables can be expressed by means of a categorical scale which, at best, is ordinal. Because ordinal data lack a natural metric structure in their underlying space, most conventional distance measures aimed at assessing the accuracy of machine learning classifications cannot be directly or meaningfully applied. In this paper, we develop a mathematical framework for comparing ordinal data based on a family of Rank Graduation $(\mathrm{RGX}_p)$ \emph{metrics}. We demonstrate that these metrics can quantify the proportion of variability of the response explained by the predictions, in a similar manner as the predictive $R^2$ for continuous response variables. After establishing theoretical connections between the $\mathrm{RGX}_p$ family and other prominent metrics in AI, we conduct extensive experiments across diverse datasets and learning tasks to evaluate their empirical performance. The results underscore the versatility, interpretability, and robustness of the $\mathrm{RGX}_p$ metrics as a principled foundation for developing trustworthy and SAFE AI systems.

On Rank Graduation Metrics for High Dimensional Ordinal Data

TL;DR

The paper addresses evaluating reliability for ordinal targets by introducing RGX_p metrics, a unifying, rank-based framework that quantifies the portion of variability explained by predictions. It develops a solid theoretical bridge between RGX_p, CvM divergences, and Lorenz/Gini concepts, and extends to multivariate settings via a whitening approach. Through extensive experiments on ESG scores with linear and neural models, it demonstrates improved accuracy, robustness, and explainability under the RGX_p framework, supported by Shapley-based feature attribution and Spearman rankings. The work provides a principled pathway for trustworthy, SAFE AI in domains with ordinal outcomes and complex multivariate structure.

Abstract

Evaluating the reliability of machine learning classifications remains a fundamental challenge in Artificial Intelligence (AI), particularly when the target variable is multidimensional. Classification variables can be expressed by means of a categorical scale which, at best, is ordinal. Because ordinal data lack a natural metric structure in their underlying space, most conventional distance measures aimed at assessing the accuracy of machine learning classifications cannot be directly or meaningfully applied. In this paper, we develop a mathematical framework for comparing ordinal data based on a family of Rank Graduation \emph{metrics}. We demonstrate that these metrics can quantify the proportion of variability of the response explained by the predictions, in a similar manner as the predictive for continuous response variables. After establishing theoretical connections between the family and other prominent metrics in AI, we conduct extensive experiments across diverse datasets and learning tasks to evaluate their empirical performance. The results underscore the versatility, interpretability, and robustness of the metrics as a principled foundation for developing trustworthy and SAFE AI systems.

Paper Structure

This paper contains 28 sections, 12 theorems, 88 equations, 4 figures, 4 tables.

Key Result

Proposition 1

Let $X\sim\mu$ and $Y\sim\nu$ be two absolutely continuous random variable and their associated probability distribution over $\mathbb{R}$. Let us denote by $\mathcal{C}_{X,Y}$ the random variable defined as $(C_{X,Y})_{\#}\mathcal{U}$, where $\mathcal{U}$ is the uniform distribution over $[0,1]$ an where $\mathcal{U}$ is a uniform distribution and $\mathcal{C}_{X,Y}$ a distribution whose c.d.f is

Figures (4)

  • Figure 5.1: Weights of single pillars obtained through the whitening process. The Environmental score yields the highest weight, followed by the Governance pillar and the Social pillar.
  • Figure 5.2: Shapley-based feature importance for each pillar, comparing Linear Model ($LM$) and Neural Network ($NN$) under 5-fold cross-validation. Bars represent mean feature contributions to model predictions (in %), with error bars indicating standard deviations across folds.
  • Figure 5.3: Shapley-based feature importance for multivariate models with whitened ESG components, comparing the Linear Model ($LM$) and Neural Network ($NN$). Feature importances (in %) are averaged across folds and weighted by $\lambda$, with error bars representing standard deviations across folds.
  • Figure 5.4: Spearman correlation matrix of feature importance rankings derived from Shapley values across univariate ($E.Sc$, $S.Sc$, $G.Sc$) and multivariate whitened models ($Multi$), for both Linear Model ($LM$) and Neural Network ($NN$) specifications.

Theorems & Definitions (31)

  • Definition 1
  • Definition 2: Scale Stability
  • Definition 3
  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • ...and 21 more