Testing the Fairness-Accuracy Improvability of Algorithms
Eric Auerbach, Annie Liang, Kyohei Okumura, Max Tabord-Meehan
TL;DR
The paper formalizes and tests the possibility of improving an algorithm's fairness without sacrificing accuracy by introducing an econometric framework for $(\Delta_r,\Delta_b,\Delta_f)$-improvability. It defines a flexible, legally cognizant objective space using group-specific accuracy utilities $U_A^g(a)$ and a two-sided fairness measure $|U_F^r(a)-U_F^b(a)|$, then proposes a data-splitting, bootstrap-based procedure to test whether a status-quo algorithm is improvable within a chosen algorithm class $\mathcal{A}$. The authors prove asymptotic validity and, under an improvement-convergence condition, consistency; they also show that repeated sample-splitting is more robust to manipulation than a single split. The empirical application to a healthcare algorithm (Obermeyer et al.) demonstrates that substantial fairness improvements are possible without reducing predictive accuracy, illustrating the approach’s practical relevance for Title VI regulation of federally funded programs. Overall, the framework provides regulators and practitioners with a transparent, flexible tool to substantiate or refute the necessity defense by testing for simultaneous improvements along fairness and accuracy criteria.
Abstract
Many organizations use algorithms that have a disparate impact, i.e., the benefits or harms of the algorithm fall disproportionately on certain social groups. Addressing an algorithm's disparate impact can be challenging, however, because it is often unclear whether it is possible to reduce this impact without sacrificing other objectives of the organization, such as accuracy or profit. Establishing the improvability of algorithms with respect to multiple criteria is of both conceptual and practical interest: in many settings, disparate impact that would otherwise be prohibited under US federal law is permissible if it is necessary to achieve a legitimate business interest. The question is how a policy-maker can formally substantiate, or refute, this "necessity" defense. In this paper, we provide an econometric framework for testing the hypothesis that it is possible to improve on the fairness of an algorithm without compromising on other pre-specified objectives. Our proposed test is simple to implement and can be applied under any exogenous constraint on the algorithm space. We establish the large-sample validity and consistency of our test, and microfound the test's robustness to manipulation based on a game between a policymaker and the analyst. Finally, we apply our approach to evaluate a healthcare algorithm originally considered by Obermeyer et al. (2019), and quantify the extent to which the algorithm's disparate impact can be reduced without compromising the accuracy of its predictions.
