Machine and Deep Learning for Credit Scoring: A compliant approach

Abdollah Rida

Machine and Deep Learning for Credit Scoring: A compliant approach

Abdollah Rida

TL;DR

The paper tackles the regulatory challenge of applying advanced machine learning to credit scoring by evaluating Basel II/III compliant, gradient-boosting models (notably XGBoost) on BANK A's auto loan data while leveraging SHAP values for explainability. It develops a rigorous framework with cross-validation, loss reweighting for class imbalance, and Weight of Evidence encoding, and compares performance against a real bank model, reporting stronger discrimination and robust out-of-time results. A key contribution is demonstrating that SHAP-based explanations can satisfy transparency expectations in a regulatory context, with detailed model reports, swap-set analyses, and process flows to accompany deployment. The work aims to bridge high-performance credit scoring with regulatory rigor, offering practical pathways for banks to adopt advanced ML while maintaining interpretability and accountability in risk management.

Abstract

Credit Scoring is one of the problems banks and financial institutions have to solve on a daily basis. If the state-of-the-art research in Machine and Deep Learning for finance has reached interesting results about Credit Scoring models, usage of such models in a heavily regulated context such as the one in banks has never been done so far. Our work is thus a tentative to challenge the current regulatory status-quo and introduce new BASEL 2 and 3 compliant techniques, while still answering the Federal Reserve Bank and the European Central Bank requirements. With the help of Gradient Boosting Machines (mainly XGBoost) we challenge an actual model used by BANK A for scoring through the door Auto Loan applicants. We prove that the usage of such algorithms for Credit Scoring models drastically improves performance and default capture rate. Furthermore, we leverage the power of Shapley Values to prove that these relatively simple models are not as black-box as the current regulatory system thinks they are, and we attempt to explain the model outputs and Credit Scores within the BANK A Model Design and Validation framework

Machine and Deep Learning for Credit Scoring: A compliant approach

TL;DR

Abstract

Paper Structure (46 sections, 2 theorems, 41 equations, 24 figures, 2 tables, 2 algorithms)

This paper contains 46 sections, 2 theorems, 41 equations, 24 figures, 2 tables, 2 algorithms.

Introduction
Model Framework and Theory
Mathematical notions
Cross-validation, Class weights and overview of the algorithm
Cross-validation
Loss Reweighting
Prediction Model
Decision Trees
Boosting
Gradient Boosting
XGBoost
Model Training, Calibration and validation
Model Specifications and Estimation
Model Specifications
Target Variable
...and 31 more sections

Key Result

Theorem 1

The Bayesian classifier $h_*$ defined for all $x \in \chi$ by: is such that:

Figures (24)

Figure 1: A CART Decision Tree from DBLP:journals/corr/ChenG16
Figure 2: Two XGBoost trees with score as output (instead of prediction) DBLP:journals/corr/ChenG16
Figure 3: Boosting process for a decision tree
Figure 4: Data Preparation Pipeline
Figure 5: Learning Curve and Reliability curve for the in-time dataset for the calibrated model
...and 19 more figures

Theorems & Definitions (5)

Theorem 1
proof
Definition 1
Theorem 2
proof

Machine and Deep Learning for Credit Scoring: A compliant approach

TL;DR

Abstract

Machine and Deep Learning for Credit Scoring: A compliant approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (5)