Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring
Hué Sullivan, Hurlin Christophe, Pérignon Christophe, Saurin Sébastien
TL;DR
XPER introduces a Shapley-value–based decomposition of predictive performance metrics, such as AUC and $R^2$, into feature contributions that are model- and metric-agnostic. It defines a meaningful benchmark φ0 (performance under independence) and allocates the difference PM − φ0 across features via Shapley weights, guaranteeing efficiency, symmetry, linearity, and null-effects. The framework supports both global and individual-level analyses, with exact or Kernel SHAP–style estimation, and includes an explicit treatment of heterogeneity through individual XPER values and clustering to form group-specific models. Empirical application to auto-loans demonstrates that a small subset of features drives most performance and that clustering on XPER can substantially boost out-of-sample accuracy (e.g., AUC rising from 0.752 to 0.912). The work provides a flexible tool for explainability focused on performance drivers, with practical implications for credit scoring, model validation, and fairness considerations.
Abstract
As they play an increasingly important role in determining access to credit, credit scoring models are under growing scrutiny from banking supervisors and internal model validators. These authorities need to monitor the model performance and identify its key drivers. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into specific contributions associated with the various features of a forecasting model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Notably, the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). Finally, we show how XPER can be used to deal with heterogeneity issues and improve performance.
