De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

Nicholas Tenev

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

Nicholas Tenev

TL;DR

This study shows that mortgage underwriting decisions trained on historical data can reproduce bias against protected groups even when ethnicity is not a predictive feature. By injecting counterfactual bias into HMDA data and training an XGBoost predictor on standard underwriting variables, the authors demonstrate that the bias can be replicated via correlations in the data. They compare four de-biasing strategies—excluding prohibited variables, FairXGBoost regularization, averaging over prohibited variables, and a novel max-over-groups approach—and find that averaging and max-over-groups can recover much of the original (unbiased) decision patterns, though performance depends on how bias is manifested (direct or via proxies). The work highlights the importance of context in fair lending: bias mitigation methods must align with the bias mechanism (explicit, proxy, or model-selection related) to avoid unintended consequences.

Abstract

Prediction models can improve efficiency by automating decisions such as the approval of loan applications. However, they may inherit bias against protected groups from the data they are trained on. This paper adds counterfactual (simulated) ethnic bias to real data on mortgage application decisions, and shows that this bias is replicated by a machine learning model (XGBoost) even when ethnicity is not used as a predictive variable. Next, several other de-biasing methods are compared: averaging over prohibited variables, taking the most favorable prediction over prohibited variables (a novel method), and jointly minimizing errors as well as the association between predictions and prohibited variables. De-biasing can recover some of the original decisions, but the results are sensitive to whether the bias is effected through a proxy.

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 2 figures, 3 tables)

This paper contains 20 sections, 3 equations, 2 figures, 3 tables.

Introduction
Context and Data
Sources of disparities in model predictions
Explicit use of prohibited factors
Group differences in predictors
Proxies for prohibited factors
Model selection bias
De-biasing methods
Exclusion of prohibited variables
Jointly optimizing accuracy and disparity between groups
Averaging over prohibited variables
Maximum prediction over prohibited variable
Empirical Methods and Results
Random bias
Excluding prohibited variables
...and 5 more sections

Figures (2)

Figure 1: Predicted denial rates by actual disposition, model, and ethnicity
Figure 2: Predicted denial rates by actual disposition, model, and ethnicity

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

TL;DR

Abstract

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

Authors

TL;DR

Abstract

Table of Contents

Figures (2)