Table of Contents
Fetching ...

Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment

Eli Ben-Michael, D. James Greiner, Kosuke Imai, Zhichao Jiang

TL;DR

The paper tackles learning improvements to deterministic pre-trial risk assessments by addressing identifiability gaps arising from lack of overlap with a fixed baseline policy. It introduces a maximin robust optimization framework that partially identifies policy value and guarantees safety relative to the status quo under plausible outcome models. Applying this to the PSA-DMF system with FTA, NCA, and NVCA scores, the authors find safe improvements are possible for NVCA threshold adjustments but largely cannot justify changes to FTA/NCA scores or broader DMF matrices given the data constraints. The approach provides a principled extrapolation-based policy-learning tool for high-stakes, rule-based systems, and highlights when and where data can support policy modifications. The work also underscores the importance of model class selection, confidence band construction, and the trade-off between safety and potential gains in deterministic-policy settings.

Abstract

Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classification scores and recommendations to help judges make release decisions. Our goal is to analyze data from a unique field experiment on an algorithmic pre-trial risk assessment to investigate whether the scores and recommendations can be improved. Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic. We develop a maximin robust optimization approach that partially identifies the expected utility of a policy, and then finds a policy that maximizes the worst-case expected utility. The resulting policy has a statistical safety property, limiting the probability of producing a worse policy than the existing one, under structural assumptions about the outcomes. Our analysis of data from the field experiment shows that we can safely improve certain components of the risk assessment instrument by classifying arrestees as lower risk under a wide range of utility specifications, though the analysis is not informative about several components of the instrument.

Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment

TL;DR

The paper tackles learning improvements to deterministic pre-trial risk assessments by addressing identifiability gaps arising from lack of overlap with a fixed baseline policy. It introduces a maximin robust optimization framework that partially identifies policy value and guarantees safety relative to the status quo under plausible outcome models. Applying this to the PSA-DMF system with FTA, NCA, and NVCA scores, the authors find safe improvements are possible for NVCA threshold adjustments but largely cannot justify changes to FTA/NCA scores or broader DMF matrices given the data constraints. The approach provides a principled extrapolation-based policy-learning tool for high-stakes, rule-based systems, and highlights when and where data can support policy modifications. The work also underscores the importance of model class selection, confidence band construction, and the trade-off between safety and potential gains in deterministic-policy settings.

Abstract

Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classification scores and recommendations to help judges make release decisions. Our goal is to analyze data from a unique field experiment on an algorithmic pre-trial risk assessment to investigate whether the scores and recommendations can be improved. Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic. We develop a maximin robust optimization approach that partially identifies the expected utility of a policy, and then finds a policy that maximizes the worst-case expected utility. The resulting policy has a statistical safety property, limiting the probability of producing a worse policy than the existing one, under structural assumptions about the outcomes. Our analysis of data from the field experiment shows that we can safely improve certain components of the risk assessment instrument by classifying arrestees as lower risk under a wide range of utility specifications, though the analysis is not informative about several components of the instrument.

Paper Structure

This paper contains 60 sections, 11 theorems, 80 equations, 20 figures, 1 table.

Key Result

Proposition 1

Let $\pi^{\inf}$ be a solution to Eqn eq:maximin. If $m^\ast \in \mathcal{M}$, and $\tilde{\pi} \in \Pi$, then $V(\tilde{\pi}, m^\ast)\leq V(\pi^{\inf}, m^\ast)$.

Figures (20)

  • Figure 1: Decision Making Framework (DMF) matrix for cases where the current charge is not a serious violent offense, the NVCA flag is not triggered, and the defendant was not extradited. If the FTA score and the NCA score are both less than 5, then the recommendation is to only require a signature bond. Otherwise the recommendation is to require some amount of cash bail. The dashed line indicates this boundary. Unshaded areas indicate impossible combinations of FTA and NCA scores.
  • Figure 2: Learning a new NVCA flag threshold. (a) Empirical restricted model class and maximin threshold with a Lipschitz multiplicative factor of $C = 3$. The points and thin lines around them are point estimates and a simultaneous 80% confidence interval for the partial CATE function $\tau(\tilde{\pi}(x_\text{nvca}), x_\text{nvca})$ when the NVCA flag is not triggered ($\tilde{\pi}(x_\text{nvca}) = 0$, in orange) and is triggered ($\tilde{\pi}(x_\text{nvca}) = 1$, in blue). The thick solid lines represent the partial identification set for the unobservable components of the CATE, $\tau(1, x_\text{nvca})$ for $x_\text{nvca} < 4$ and $\tau(0, x_\text{nvca})$ for $x_\text{nvca} \geq 4$. The purple dashed line represents the baseline policy of triggering the flag when $x_\text{nvca} \geq 4$, and the pink dashed line is the empirical safe policy that only triggers the flag when $x_\text{nvca} \geq 6$. (b) Maximin threshold values solving Eqn \ref{['eq:maximin_emp_bnd']} for the NVCA flag threshold rule with a level of $1 - \alpha = 80\%$ as the cost of an NVCA increases from 1 to 20 times of the cost of triggering the NVCA flag, and the multiplicative factor on the estimated Lipschitz constant varies from 1 to 10.
  • Figure 3: The size (as a percentage of its maximum value) of two different model classes with respect to the linear threshold policy class versus the confidence level $1-\alpha$ for the FTA (green), NCA (orange), and NVCA (purple) scoring rules. The dashed purple line shows the size for the NVCA model class when the threshold is included as a decision variable and learned in addition to the weights.
  • Figure 4: (a) The percentage point difference in the proportion of arrestees flagged for NVCA risk between the maximin policy and the original NVCA score as the cost of an NVCA increases from 1 to 15 times of the cost of triggering the NVCA flag and the confidence level varies between 0% and 100%. (b) Change in Maximin NVCA flag weights $\theta$ (in Eqn \ref{['eq:integer_weight_policy']}) as the cost of an NVCA increases from 1 to 15 times the cost of triggering the NVCA flag, at a confidence level of $1 - \alpha = 80\%$.
  • Figure 5: (a) The size (as a percentage of the maximum value) of the additive model class with respect to the monotone policy class as the confidence level varies for cash bail recommendation policies, collapsing together successively more gradations on bail. The coarsest policy---Signature Bond vs any Cash Bail---has the most information available. (b) Maximin monotone cash bail recommendations under an additive model for the treatment effects, as the cost of an NVCA and the confidence level vary. The dashed black line indicates the original decision boundary between a signature bond (above and to the left) and cash bail (below and to the right). The original decision boundary is modified only when the cost and confidence are low.
  • ...and 15 more figures

Theorems & Definitions (19)

  • Proposition 1: Population safety
  • Theorem 1: Statistical safety
  • Theorem 2: Optimality gap
  • Theorem A.1: Population optimality gap
  • Theorem A.2: Statistical safety (with $\alpha = 1$)
  • Corollary A.1: Statistical safety (with $\alpha = 1$)
  • Theorem A.3: Optimality gap (with $\alpha = 1$)
  • Corollary A.2: Optimality gap (with $\alpha = 1$)
  • Proposition A.1
  • proof : Proof of Proposition \ref{['prop:pop_safety']}
  • ...and 9 more