Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression models
Jorge Paz-Ruza, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Brais Cancela, Carlos Eiras-Franco
TL;DR
This work reveals a pervasive eccentricity bias in dyadic regression, where non-uniform local value distributions cause models to overfit toward the dyad-average, undermining fairness and safety in critical domains. To address this, the authors define Eccentricity-Area Under the Curve (EAUC), a differentiable bias-aware metric that ranks prediction error by dyad eccentricity $Ecc_{ui}$ and is computed via $EAUC = \frac{\sum (Ecc_{x_i}-Ecc_{x_{i-1}})(\epsilon_{x_i}+\epsilon_{x_{i-1}})/2}{(\max(r_{ui})-\min(r_{ui}))^2}$, with $Ecc_{ui} = |r_{ui}-\tfrac{1}{2}(\bar{r}_u+\bar{r}_i)|$ and $\epsilon_{ui}=(\hat{r}_{ui}-r_{ui})^2$. Across six real-world datasets (Netflix Prize, Movielens, GDSC1, CTRPv2, IMF DOTS, Kiva) and multiple dyadic regression models, RMSE/MAE fail to reveal biases, while EAUC uncovers substantial eccentricity bias and its correlation with dataset non-uniformity measured by $D_{KS_{\mathcal{D}}}$. The paper also shows naive post-training bias corrections can reduce EAUC, illustrating EAUC’s potential to guide fairness-aware training and model selection. Together, these results advocate for bias-aware evaluation and learning strategies in dyadic regression to prevent unfair outcomes in high-stakes applications.$DMV_{ui}$, $Ecc_{ui}$, $EAUC$, and $D_{KS_{\mathcal{D}}}$ are central to the methodology.
Abstract
Dyadic regression models, which output real-valued predictions for pairs of entities, are fundamental in many domains (e.g. obtaining user-product ratings in Recommender Systems) and promising and under exploration in others (e.g. tuning patient-drug dosages in precision pharmacology). In this work, we prove that non-uniform observed value distributions of individual entities lead to severe biases in state-of-the-art models, skewing predictions towards the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet crucial cases; we name this phenomenon eccentricity bias. We show that global error metrics like Root Mean Squared Error (RMSE) are insufficient to capture this bias, and we introduce Eccentricity-Area Under the Curve (EAUC) as a novel metric that can quantify it in all studied domains and models. We prove the intuitive interpretation of EAUC by experimenting with naive post-training bias corrections, and theorize other options to use EAUC to guide the construction of fair models. This work contributes a bias-aware evaluation of dyadic regression to prevent unfairness in critical real-world applications of such systems.
