Unpacking the Black Box: Regulating Algorithmic Decisions
Laura Blattner, Scott Nelson, Jann Spiess
TL;DR
This paper develops a principal–agent framework for regulating high-stakes, complex predictive algorithms when the regulator can only observe simple explanations of the model. It analyzes the welfare trade-offs between ex-ante restrictions to simple, fully transparent predictors and ex-post explanations of complex predictors, showing that targeted explanations aligned with the source of misalignment often outperform agnostic explanations or full transparency. The authors derive theoretical results under linear-quadratic assumptions and validate them empirically in consumer lending, demonstrating that complex credit-scoring models paired with context-specific explanations can outperform simple, transparent rules for both fairness (disparate impact) and risk-management objectives. The findings offer practical guidance for regulators: design explanation tools that focus on the misalignment source and tailor explanations to application context to achieve Pareto-improving regulation while preserving predictive performance.
Abstract
What should regulators of complex algorithms regulate? We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. In our model, a regulator is limited in how much she can learn about a black-box model deployed by an agent with misaligned preferences. The regulator faces two choices: first, whether to allow for the use of complex algorithms; and second, which key properties of algorithms to regulate. We show that limiting agents to algorithms that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and complex algorithms have sufficiently better performance than simple ones. Allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them. Regulation that focuses on the overall average behavior of algorithms, for example based on standard explainer tools, will generally be inefficient. Targeted regulation that focuses on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that is preferred both by the lender and from the perspective of the financial regulator.
