An experimental study on fairness-aware machine learning for credit scoring problem
Huyen Giang Thi Thu, Thang Viet Doan, Tai Le Quy
TL;DR
The paper investigates fairness in credit scoring by conducting a comprehensive empirical study of fairness-aware ML methods (pre-, in-, and post-processing) across six public financial datasets. It shows that dataset bias exists, and fairness-aware techniques can improve equitable outcomes while maintaining competitive accuracy, with AdaFair often delivering strong performance across multiple datasets. The findings offer practical guidance for lenders and regulators on selecting fairness approaches and metrics, highlighting trade-offs among SP, EO, EOd, PP, PE, TE, and ABROCA. The work also points to future directions, including multi-attribute fairness, synthetic data generation for finance, and interpretable fair models to diagnose and mitigate bias sources.
Abstract
Digitalization of credit scoring is an essential requirement for financial organizations and commercial banks, especially in the context of digital transformation. Machine learning techniques are commonly used to evaluate customers' creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets.
