Table of Contents
Fetching ...

Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models

Jorge Paz-Ruza, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Brais Cancela, Carlos Eiras-Franco

TL;DR

This work reveals a pervasive eccentricity bias in dyadic regression, where non-uniform local value distributions cause models to overfit toward the dyad-average, undermining fairness and safety in critical domains. To address this, the authors define Eccentricity-Area Under the Curve (EAUC), a differentiable bias-aware metric that ranks prediction error by dyad eccentricity $Ecc_{ui}$ and is computed via $EAUC = \frac{\sum (Ecc_{x_i}-Ecc_{x_{i-1}})(\epsilon_{x_i}+\epsilon_{x_{i-1}})/2}{(\max(r_{ui})-\min(r_{ui}))^2}$, with $Ecc_{ui} = |r_{ui}-\tfrac{1}{2}(\bar{r}_u+\bar{r}_i)|$ and $\epsilon_{ui}=(\hat{r}_{ui}-r_{ui})^2$. Across six real-world datasets (Netflix Prize, Movielens, GDSC1, CTRPv2, IMF DOTS, Kiva) and multiple dyadic regression models, RMSE/MAE fail to reveal biases, while EAUC uncovers substantial eccentricity bias and its correlation with dataset non-uniformity measured by $D_{KS_{\mathcal{D}}}$. The paper also shows naive post-training bias corrections can reduce EAUC, illustrating EAUC’s potential to guide fairness-aware training and model selection. Together, these results advocate for bias-aware evaluation and learning strategies in dyadic regression to prevent unfair outcomes in high-stakes applications.$DMV_{ui}$, $Ecc_{ui}$, $EAUC$, and $D_{KS_{\mathcal{D}}}$ are central to the methodology.

Abstract

Dyadic regression models, which output real-valued predictions for pairs of entities, are fundamental in many domains (e.g. obtaining user-product ratings in Recommender Systems) and promising and under exploration in others (e.g. tuning patient-drug dosages in precision pharmacology). In this work, we prove that non-uniform observed value distributions of individual entities lead to severe biases in state-of-the-art models, skewing predictions towards the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet crucial cases; we name this phenomenon eccentricity bias. We show that global error metrics like Root Mean Squared Error (RMSE) are insufficient to capture this bias, and we introduce Eccentricity-Area Under the Curve (EAUC) as a novel metric that can quantify it in all studied domains and models. We prove the intuitive interpretation of EAUC by experimenting with naive post-training bias corrections, and theorize other options to use EAUC to guide the construction of fair models. This work contributes a bias-aware evaluation of dyadic regression to prevent unfairness in critical real-world applications of such systems.

Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models

TL;DR

This work reveals a pervasive eccentricity bias in dyadic regression, where non-uniform local value distributions cause models to overfit toward the dyad-average, undermining fairness and safety in critical domains. To address this, the authors define Eccentricity-Area Under the Curve (EAUC), a differentiable bias-aware metric that ranks prediction error by dyad eccentricity and is computed via , with and . Across six real-world datasets (Netflix Prize, Movielens, GDSC1, CTRPv2, IMF DOTS, Kiva) and multiple dyadic regression models, RMSE/MAE fail to reveal biases, while EAUC uncovers substantial eccentricity bias and its correlation with dataset non-uniformity measured by . The paper also shows naive post-training bias corrections can reduce EAUC, illustrating EAUC’s potential to guide fairness-aware training and model selection. Together, these results advocate for bias-aware evaluation and learning strategies in dyadic regression to prevent unfair outcomes in high-stakes applications., , , and are central to the methodology.

Abstract

Dyadic regression models, which output real-valued predictions for pairs of entities, are fundamental in many domains (e.g. obtaining user-product ratings in Recommender Systems) and promising and under exploration in others (e.g. tuning patient-drug dosages in precision pharmacology). In this work, we prove that non-uniform observed value distributions of individual entities lead to severe biases in state-of-the-art models, skewing predictions towards the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet crucial cases; we name this phenomenon eccentricity bias. We show that global error metrics like Root Mean Squared Error (RMSE) are insufficient to capture this bias, and we introduce Eccentricity-Area Under the Curve (EAUC) as a novel metric that can quantify it in all studied domains and models. We prove the intuitive interpretation of EAUC by experimenting with naive post-training bias corrections, and theorize other options to use EAUC to guide the construction of fair models. This work contributes a bias-aware evaluation of dyadic regression to prevent unfairness in critical real-world applications of such systems.
Paper Structure (18 sections, 7 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 7 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Main topics and contributions of this research work, including the definition of eccentricity bias in dyadic regression, the limitations of global error metrics like RMSE or MAE for comprehensively evaluating said tasks, and the proposal of EAUC as a novel metric that can quantify the degree of eccentricity bias (and its subsequent unfairness) a dyadic regression model suffers.
  • Figure 2: Eccentricity $Ecc_{ui}$ vs. prediction error $|\hat{r}_{ui} - r_{ui}|$ of an MF model and a uniform random predictor in Netflix Prize test examples (avg. and std. dev. of five runs). The $y=x$ line corresponds to predicting always the $DMV_{ui}$ of each example.
  • Figure 3: Observed vs. predicted values $r_{ui}$ and $\hat{r}_{ui}$ (avg. and std. dev.) of an MF model and an ideal dyadic regressor in Netflix Prize test examples, in two different scenarios of user-item pairs with low and high average ratings (low and high $DMV_{ui}$).
  • Figure 4: Usage of EAUC for evaluation of dyadic regression tasks, by a) characterizing the EAUC vs. RMSE trade-off to incorporate the fairness component into model evaluation, and b) analyzing the computed Eccentricity vs. Prediction Error curve (color codes in Subfigure b) represent the corresponding model characteristics in Subfigure a)). While a dyadic regressor with high RMSE and low EAUC is plausible in theory, it is unlikely in practice as models minimize global errors during training.
  • Figure 5: Relationship between eccentricity of observed values and prediction error (lower is better) for each model and dataset (avg. and std. dev. of 5 runs). On Netflix Prize, ML models other than MF could not be executed due to dataset size and model computational complexity.
  • ...and 4 more figures