Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

Hué Sullivan; Hurlin Christophe; Pérignon Christophe; Saurin Sébastien

Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

Hué Sullivan, Hurlin Christophe, Pérignon Christophe, Saurin Sébastien

TL;DR

XPER introduces a Shapley-value–based decomposition of predictive performance metrics, such as AUC and $R^2$, into feature contributions that are model- and metric-agnostic. It defines a meaningful benchmark φ0 (performance under independence) and allocates the difference PM − φ0 across features via Shapley weights, guaranteeing efficiency, symmetry, linearity, and null-effects. The framework supports both global and individual-level analyses, with exact or Kernel SHAP–style estimation, and includes an explicit treatment of heterogeneity through individual XPER values and clustering to form group-specific models. Empirical application to auto-loans demonstrates that a small subset of features drives most performance and that clustering on XPER can substantially boost out-of-sample accuracy (e.g., AUC rising from 0.752 to 0.912). The work provides a flexible tool for explainability focused on performance drivers, with practical implications for credit scoring, model validation, and fairness considerations.

Abstract

As they play an increasingly important role in determining access to credit, credit scoring models are under growing scrutiny from banking supervisors and internal model validators. These authorities need to monitor the model performance and identify its key drivers. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into specific contributions associated with the various features of a forecasting model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Notably, the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). Finally, we show how XPER can be used to deal with heterogeneity issues and improve performance.

Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

TL;DR

XPER introduces a Shapley-value–based decomposition of predictive performance metrics, such as AUC and

, into feature contributions that are model- and metric-agnostic. It defines a meaningful benchmark φ0 (performance under independence) and allocates the difference PM − φ0 across features via Shapley weights, guaranteeing efficiency, symmetry, linearity, and null-effects. The framework supports both global and individual-level analyses, with exact or Kernel SHAP–style estimation, and includes an explicit treatment of heterogeneity through individual XPER values and clustering to form group-specific models. Empirical application to auto-loans demonstrates that a small subset of features drives most performance and that clustering on XPER can substantially boost out-of-sample accuracy (e.g., AUC rising from 0.752 to 0.912). The work provides a flexible tool for explainability focused on performance drivers, with practical implications for credit scoring, model validation, and fairness considerations.

Abstract

) into specific contributions associated with the various features of a forecasting model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Notably, the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). Finally, we show how XPER can be used to deal with heterogeneity issues and improve performance.

Paper Structure (45 sections, 14 theorems, 145 equations, 16 figures, 11 tables, 1 algorithm)

This paper contains 45 sections, 14 theorems, 145 equations, 16 figures, 11 tables, 1 algorithm.

Introduction
A primer on XPER
Framework and performance metrics
XPER values
Definition
Axioms
Individual XPER values
Definition
Dealing with model heterogeneity
Estimation
Simulations
Empirical application
Data and model
XPER decomposition
Using XPER to boost model performance
...and 30 more sections

Key Result

Proposition 1

SHAP is a particular case of XPER where the individual contribution to the performance metric is equal to the predicted value of the model, $G\left(y_i;\mathbf{x}_i;\delta_0\right) = \hat{f}(\mathbf{x}_i)$.

Figures (16)

Figure 1: Empirical distributions of AUC and XPER values
Figure 2: XPER decomposition and Permutation Importance
Figure 3: XPER vs. other feature contribution methodologies
Figure 4: Features distribution by group based on XPER values
Figure A1: $R^2$ XPER values $\phi_{i,1}$ in a three-fold model
...and 11 more figures

Theorems & Definitions (31)

Definition 1
Definition 2: Shapley_1953
Definition 3: XPER
Definition 4
Proposition 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
...and 21 more

Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

TL;DR

Abstract

Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (31)