Table of Contents
Fetching ...

The Fairness of Credit Scoring Models

Christophe Hurlin, Christophe Pérignon, Sébastien Saurin

TL;DR

This paper tackles the problem of algorithmic fairness in credit scoring by proposing a formal framework to test fairness, diagnose the drivers of bias, and mitigate disparities while preserving predictive accuracy. It combines a likelihood-ratio fairness inference, a novel FPDP interpretability method to identify candidate variables, and a post-processing mitigation approach that neutralizes selected features and uses Pareto-front optimization to balance fairness and performance. Empirical validation on the German and Taiwan credit datasets shows that removing or neutralizing a small set of proxy variables can restore fairness with modest losses in accuracy, while hyperparameter choices can strongly affect outcomes, highlighting operational and regulatory risks. The work provides practical tools for lenders and regulators to monitor, diagnose, and improve fair lending practices in high-stakes credit decisions, with potential applicability to other automated decision processes.

Abstract

In credit markets, screening algorithms aim to discriminate between good-type and bad-type borrowers. However, when doing so, they can also discriminate between individuals sharing a protected attribute (e.g. gender, age, racial origin) and the rest of the population. This can be unintentional and originate from the training dataset or from the model itself. We show how to formally test the algorithmic fairness of scoring models and how to identify the variables responsible for any lack of fairness. We then use these variables to optimize the fairness-performance trade-off. Our framework provides guidance on how algorithmic fairness can be monitored by lenders, controlled by their regulators, improved for the benefit of protected groups, while still maintaining a high level of forecasting accuracy.

The Fairness of Credit Scoring Models

TL;DR

This paper tackles the problem of algorithmic fairness in credit scoring by proposing a formal framework to test fairness, diagnose the drivers of bias, and mitigate disparities while preserving predictive accuracy. It combines a likelihood-ratio fairness inference, a novel FPDP interpretability method to identify candidate variables, and a post-processing mitigation approach that neutralizes selected features and uses Pareto-front optimization to balance fairness and performance. Empirical validation on the German and Taiwan credit datasets shows that removing or neutralizing a small set of proxy variables can restore fairness with modest losses in accuracy, while hyperparameter choices can strongly affect outcomes, highlighting operational and regulatory risks. The work provides practical tools for lenders and regulators to monitor, diagnose, and improve fair lending practices in high-stakes credit decisions, with potential applicability to other automated decision processes.

Abstract

In credit markets, screening algorithms aim to discriminate between good-type and bad-type borrowers. However, when doing so, they can also discriminate between individuals sharing a protected attribute (e.g. gender, age, racial origin) and the rest of the population. This can be unintentional and originate from the training dataset or from the model itself. We show how to formally test the algorithmic fairness of scoring models and how to identify the variables responsible for any lack of fairness. We then use these variables to optimize the fairness-performance trade-off. Our framework provides guidance on how algorithmic fairness can be monitored by lenders, controlled by their regulators, improved for the benefit of protected groups, while still maintaining a high level of forecasting accuracy.
Paper Structure (32 sections, 1 theorem, 16 equations, 15 figures, 12 tables)

This paper contains 32 sections, 1 theorem, 16 equations, 15 figures, 12 tables.

Key Result

Theorem 1

Under the null hypothesis of fairness $\text{H}_{0,i}$, the test statistic $F_{H_{0,i}}$ converges in distribution to a chi-squared distribution as the sample size $n$ tends to infinity:

Figures (15)

  • Figure 1: Measures of association between features, target variables, and gender
  • Figure 2: Fairness PDP for the statistical parity in TREE-prime model
  • Figure 3: Accuracy-fairness trade-off
  • Figure A1: Feature distributions
  • Figure A2: Feature distribution by class of risk
  • ...and 10 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem : Fairness test
  • Definition 5
  • Definition 6
  • proof