Table of Contents
Fetching ...

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

Zeyang Jia, Eli Ben-Michael, Kosuke Imai

TL;DR

The paper tackles high-stakes policy learning from historical data by introducing Average Conditional Risk (ACRisk) to quantify subgroup harm and a Bayesian safe policy learning framework that maximizes posterior value while constraining the posterior risk (PACRisk). It demonstrates a chance-constrained reformulation that reduces to a linear constraint, enabling tractable optimization using Bayesian nonparametric CATE estimators. Through simulations, PACRisk control is shown to reduce true subgroup risk and, in some settings, improves average policy value via regularization, especially with limited data or low signal-to-noise. The empirical application to the Vietnam War-era Hamlet Evaluation System reveals that data-derived policies shift weight away from military factors toward economic and political factors, producing more secure regional assessments while preserving interpretability through the use of decision tables.

Abstract

Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

TL;DR

The paper tackles high-stakes policy learning from historical data by introducing Average Conditional Risk (ACRisk) to quantify subgroup harm and a Bayesian safe policy learning framework that maximizes posterior value while constraining the posterior risk (PACRisk). It demonstrates a chance-constrained reformulation that reduces to a linear constraint, enabling tractable optimization using Bayesian nonparametric CATE estimators. Through simulations, PACRisk control is shown to reduce true subgroup risk and, in some settings, improves average policy value via regularization, especially with limited data or low signal-to-noise. The empirical application to the Vietnam War-era Hamlet Evaluation System reveals that data-derived policies shift weight away from military factors toward economic and political factors, producing more secure regional assessments while preserving interpretability through the use of decision tables.

Abstract

Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.
Paper Structure (24 sections, 1 theorem, 19 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 1 theorem, 19 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Define the posterior conditional benefit and risk of decision $k$ relative to the existing policy $\tilde{\delta}$ as, where $\tau_k(\bm{x},\boldsymbol{\theta}):=\mathbb{E}\left[u(k,Y(k))-u(\tilde{\delta}(\bm{x}),Y(\tilde{\delta}(\bm{x})))\mid \bm{X}=\bm{x}, \boldsymbol{\Theta}=\boldsymbol{\theta} \right]$. Then, the chance-constrained optimization defined in Equation eq:safe_bayes is equivalent

Figures (7)

  • Figure 1: Aggregation of 20 sub-model scores. The Hamlet Evaluation System (HES) uses 20 sub-model scores as inputs, and aggregates them using two-way and three-way decision tables. Each circle corresponds to one aggregation based on the two-way or three-way decision table, and the decision tables used in different circles are the same.
  • Figure 2: The average value (left panel) and ACRisk (right panel) of the learned policies using the data with covariate overlap, varying the safety constraint $\epsilon$ and sample size $n$.
  • Figure 3: Average Value (left panel) and ACRisk (right panel) for learned policies using data without covariate overlap, varying the safety constraint $\epsilon$ and prior smoothness for the CATE $l$ (a greater value corresponds to a greater degree of prior smoothness).
  • Figure 4: The posterior expected utility of the learned policy (left panel) and the proportion of elements in the three-way table changed by the learned policy (right panel) under different values of $\epsilon$, when regional safety development is the outcome. A weaker safety constraint (i.e., a greater value of $\epsilon$) leads to a greater difference between the baseline and learned policies. The posterior expected utility also becomes greater.
  • Figure 5: The relative change of the PD function from the baseline policy to the learned policy for $\epsilon = 0.1$. Each block corresponds to a different input of the PD function, and different colors corresponds to different level-3 scores. For example, the first blue bar in the first block corresponds to $\{I_{military}(1;T_3)-I_{military}(1;\tilde{T}_3)\}/I_{military}(1;\tilde{T}_3)$, where $T_3$ and $\tilde{T}_3$ are the learned and baseline policies, respectively.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Average Conditional Risk (ACRisk)
  • Definition 2: Average Individual Risk (AIRisk)
  • Definition 3: Posterior Average Conditional Risk (PACRisk)
  • Theorem 1: Control of the PACRisk as a Linear Constraint