Observing Context Improves Disparity Estimation when Race is Unobserved
Kweku Kwegyir-Aggrey, Naveen Durvasula, Jennifer Wang, Suresh Venkatasubramanian
TL;DR
The paper tackles the challenge of estimating racial disparities when individual race data are unavailable, highlighting biases in standard proxy methods like BISG. It introduces two contextual proxy approaches, $cBISG$ and $MICSG$, and a Bayes estimator for disparity that achieves unbiased estimates under a mean-consistency condition. Through large-scale experiments on HMDA mortgage data and North Carolina voter data, the authors show that contextual proxies yield more accurate race predictions and disparity estimates, with reduced mean-consistency violations for minority groups. The work provides a practical pathway to more reliable disparity estimation in settings where direct race data are difficult to obtain or legally constrained, by leveraging contextual information and calibration-based guarantees.
Abstract
In many domains, it is difficult to obtain the race data that is required to estimate racial disparity. To address this problem, practitioners have adopted the use of proxy methods which predict race using non-protected covariates. However, these proxies often yield biased estimates, especially for minority groups, limiting their real-world utility. In this paper, we introduce two new contextual proxy models that advance existing methods by incorporating contextual features in order to improve race estimates. We show that these algorithms demonstrate significant performance improvements in estimating disparities on real-world home loan and voter data. We establish that achieving unbiased disparity estimates with contextual proxies relies on mean-consistency, a calibration-like condition.
