Best Practices for Responsible Machine Learning in Credit Scoring
Giovani Valdrighi, Athyrson M. Ribeiro, Jansen S. B. Pereira, Vitoria Guardieiro, Arthur Hendricks, Décio Miranda Filho, Juan David Nieto Garcia, Felipe F. Bocca, Thalita B. Veronese, Lucas Wanner, Marcos Medeiros Raimundo
TL;DR
This paper investigates responsible machine learning for credit scoring by exploring three core areas: fairness, reject inference, and explainability. It surveys definitions and metrics for fairness, presents pre-, in-, and post-processing mitigation techniques, and experimentally compares their impact on performance and bias across multiple datasets. It also addresses sample bias via reject inference, detailing augmentation, extrapolation, and label spreading methods with an empirical evaluation. Finally, it discusses explainability approaches, including global/local explanations and counterfactuals, to enable model auditing and actionable guidance for applicants, highlighting trade-offs and practical considerations for real-world lending. Overall, the work offers a practical framework of best practices to deploy fair, transparent, and inclusive credit scoring systems while acknowledging remaining challenges and future directions.
Abstract
The widespread use of machine learning in credit scoring has brought significant advancements in risk assessment and decision-making. However, it has also raised concerns about potential biases, discrimination, and lack of transparency in these automated systems. This tutorial paper performed a non-systematic literature review to guide best practices for developing responsible machine learning models in credit scoring, focusing on fairness, reject inference, and explainability. We discuss definitions, metrics, and techniques for mitigating biases and ensuring equitable outcomes across different groups. Additionally, we address the issue of limited data representativeness by exploring reject inference methods that incorporate information from rejected loan applications. Finally, we emphasize the importance of transparency and explainability in credit models, discussing techniques that provide insights into the decision-making process and enable individuals to understand and potentially improve their creditworthiness. By adopting these best practices, financial institutions can harness the power of machine learning while upholding ethical and responsible lending practices.
