The Fairness of Credit Scoring Models
Christophe Hurlin, Christophe Pérignon, Sébastien Saurin
TL;DR
This paper tackles the problem of algorithmic fairness in credit scoring by proposing a formal framework to test fairness, diagnose the drivers of bias, and mitigate disparities while preserving predictive accuracy. It combines a likelihood-ratio fairness inference, a novel FPDP interpretability method to identify candidate variables, and a post-processing mitigation approach that neutralizes selected features and uses Pareto-front optimization to balance fairness and performance. Empirical validation on the German and Taiwan credit datasets shows that removing or neutralizing a small set of proxy variables can restore fairness with modest losses in accuracy, while hyperparameter choices can strongly affect outcomes, highlighting operational and regulatory risks. The work provides practical tools for lenders and regulators to monitor, diagnose, and improve fair lending practices in high-stakes credit decisions, with potential applicability to other automated decision processes.
Abstract
In credit markets, screening algorithms aim to discriminate between good-type and bad-type borrowers. However, when doing so, they can also discriminate between individuals sharing a protected attribute (e.g. gender, age, racial origin) and the rest of the population. This can be unintentional and originate from the training dataset or from the model itself. We show how to formally test the algorithmic fairness of scoring models and how to identify the variables responsible for any lack of fairness. We then use these variables to optimize the fairness-performance trade-off. Our framework provides guidance on how algorithmic fairness can be monitored by lenders, controlled by their regulators, improved for the benefit of protected groups, while still maintaining a high level of forecasting accuracy.
