Table of Contents
Fetching ...

Unpacking the Black Box: Regulating Algorithmic Decisions

Laura Blattner, Scott Nelson, Jann Spiess

TL;DR

This paper develops a principal–agent framework for regulating high-stakes, complex predictive algorithms when the regulator can only observe simple explanations of the model. It analyzes the welfare trade-offs between ex-ante restrictions to simple, fully transparent predictors and ex-post explanations of complex predictors, showing that targeted explanations aligned with the source of misalignment often outperform agnostic explanations or full transparency. The authors derive theoretical results under linear-quadratic assumptions and validate them empirically in consumer lending, demonstrating that complex credit-scoring models paired with context-specific explanations can outperform simple, transparent rules for both fairness (disparate impact) and risk-management objectives. The findings offer practical guidance for regulators: design explanation tools that focus on the misalignment source and tailor explanations to application context to achieve Pareto-improving regulation while preserving predictive performance.

Abstract

What should regulators of complex algorithms regulate? We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. In our model, a regulator is limited in how much she can learn about a black-box model deployed by an agent with misaligned preferences. The regulator faces two choices: first, whether to allow for the use of complex algorithms; and second, which key properties of algorithms to regulate. We show that limiting agents to algorithms that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and complex algorithms have sufficiently better performance than simple ones. Allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them. Regulation that focuses on the overall average behavior of algorithms, for example based on standard explainer tools, will generally be inefficient. Targeted regulation that focuses on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that is preferred both by the lender and from the perspective of the financial regulator.

Unpacking the Black Box: Regulating Algorithmic Decisions

TL;DR

This paper develops a principal–agent framework for regulating high-stakes, complex predictive algorithms when the regulator can only observe simple explanations of the model. It analyzes the welfare trade-offs between ex-ante restrictions to simple, fully transparent predictors and ex-post explanations of complex predictors, showing that targeted explanations aligned with the source of misalignment often outperform agnostic explanations or full transparency. The authors derive theoretical results under linear-quadratic assumptions and validate them empirically in consumer lending, demonstrating that complex credit-scoring models paired with context-specific explanations can outperform simple, transparent rules for both fairness (disparate impact) and risk-management objectives. The findings offer practical guidance for regulators: design explanation tools that focus on the misalignment source and tailor explanations to application context to achieve Pareto-improving regulation while preserving predictive performance.

Abstract

What should regulators of complex algorithms regulate? We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. In our model, a regulator is limited in how much she can learn about a black-box model deployed by an agent with misaligned preferences. The regulator faces two choices: first, whether to allow for the use of complex algorithms; and second, which key properties of algorithms to regulate. We show that limiting agents to algorithms that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and complex algorithms have sufficiently better performance than simple ones. Allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them. Regulation that focuses on the overall average behavior of algorithms, for example based on standard explainer tools, will generally be inefficient. Targeted regulation that focuses on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that is preferred both by the lender and from the perspective of the financial regulator.

Paper Structure

This paper contains 23 sections, 6 theorems, 18 equations, 4 figures, 4 tables.

Key Result

Proposition 1

Write $f^A_\theta = \mathop{\mathrm{arg\,min}}\limits_{f \in \mathcal{F}} \mathop{}\!\textnormal{E}_\theta[(y - f(x))^2]$ for the agent's first-best choice from $\mathcal{F}$, $\beta^A_\theta = \textsc{e} f^A_\theta$ for its projection onto $x_S$, and $r^A_\theta(x) = f^A_\theta(x) - x_S' \beta^A_\t

Figures (4)

  • Figure 1: Illustration of the structure of a complex function $\hat{f}$ (left panel) as well as the information retained in simple explainers $\textsc{e} \hat{f}$ (center and right panels) from the \ref{['ex:linearinteracted']}. Each cell corresponds to a combination of the values of the two binary covariates $x_1$ and $x_2$, and the values in the cells or across two cells represent the information retained in each case.
  • Figure 2: Out-of-sample performance of unconstrained and constrained prediction models across key objectives for the disparate impact (left) and risk (right) applications. Complex models are XGBoost models on all 518 covariates, while simple models are linear regression on five covariates chosen by the LASSO. The frontiers vary the relative weight put on each objective, where the regulator and lender marks correspond to the solutions maximizing the empirical analog to the objectives \ref{['eq:DI-P']}, \ref{['eq:DI-A']} and \ref{['eq:R-P']}, \ref{['eq:R-A']} respectively. The colored lines represent prediction models subject to constraints imposed by the agnostic and targeted explainers, respectively. The gray line represents the points at which the regulator would be indifferent to its preferred simple model.
  • Figure 3: Out-of-sample performance of unconstrained and constrained prediction models across key objectives for the disparate impact (top) and risk (bottom) applications as in \ref{['fig:results']}, but with varying explainer complexity. The left panels use ten variables for the simple (linear) model and the explainer (in addition to a constant for the intercept), while the right panel uses twenty. This illustrates robustness of the results reported in \ref{['tbl:results']} and \ref{['fig:results']} based on five explainer variables.
  • Figure 4: Performance of models of varying complexity in the training sample (left panels) and on the hold-out set (right panels) for the disparate impact (top panels) and risk (bottom panels) examples, following \ref{['fig:results']}. The simple linear model is a linear-regression model on five covariates chosen by the LASSO. The complex linear and complex XGBoost models use all 518 covariates to predict repayment. The regulator and lender marks correspond to the solutions maximizing the empirical analog to the objectives \ref{['eq:DI-P']}, \ref{['eq:DI-A']} and \ref{['eq:R-P']}, \ref{['eq:R-A']}, respectively.

Theorems & Definitions (15)

  • Proposition 1: Constrained agent choice
  • Example : name=Fully interacted linear regression,label=ex:linearinteracted
  • Definition 1: Covariate shift
  • Definition 2: Model shift
  • Theorem 1: Alignment through simplicity vs. alignment despite complexity
  • Proposition 2: Distributional preferences
  • Example : name=Fully interacted linear regression,continues=ex:linearinteracted
  • Theorem 2: Targeted explainer
  • Corollary 1: Targeted explainer for disparate impact
  • Corollary 2: Optimal regulation
  • ...and 5 more