Binary Choice under Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Algorithmic Fairness
Andrii Babii, Xi Chen, Eric Ghysels, Rohit Kumar
TL;DR
The paper tackles binary decision problems in data-rich settings under asymmetric, covariate-dependent losses by introducing a loss-based reweighting strategy that reduces to convexified empirical risk minimization. By replacing the nonconvex indicator loss with convex surrogates $\phi$, it derives a general excess-risk bound that ties the binary decision risk to the convexified risk, enabling use of standard ML methods (logistic regression, boosting, deep nets, SVM) with weights determined by the losses. The authors provide finite-sample rates for parametric models, high-dimensional LASSO variants, and deep neural networks under a Tsybakov-type margin condition, and demonstrate that the approach can reproduce minimax-optimal behavior in certain regimes. They illustrate the method with Monte Carlo simulations and a substantive empirical application to pretrial detention fairness using the Broward COMPAS dataset, showing how covariate-driven loss functions can reduce disparities and align classifier performance with welfare-inspired objectives. Overall, the work offers a distribution-free, conceptually transparent framework for cost-sensitive binary decisions in high dimensions, with direct implications for algorithmic fairness and policy design.
Abstract
We study the binary choice problem in a data-rich environment with asymmetric loss functions. The econometrics literature covers nonparametric binary choice problems but does not offer computationally attractive solutions in data-rich environments. The machine learning literature has many algorithms but is focused mostly on loss functions that are independent of covariates. We show that theoretically valid decisions on binary outcomes with general loss functions can be achieved via a very simple loss-based reweighting of logistic regression or state-of-the-art machine learning techniques. We apply our analysis to algorithmic fairness in pretrial detentions.
