Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War
Zeyang Jia, Eli Ben-Michael, Kosuke Imai
TL;DR
The paper tackles high-stakes policy learning from historical data by introducing Average Conditional Risk (ACRisk) to quantify subgroup harm and a Bayesian safe policy learning framework that maximizes posterior value while constraining the posterior risk (PACRisk). It demonstrates a chance-constrained reformulation that reduces to a linear constraint, enabling tractable optimization using Bayesian nonparametric CATE estimators. Through simulations, PACRisk control is shown to reduce true subgroup risk and, in some settings, improves average policy value via regularization, especially with limited data or low signal-to-noise. The empirical application to the Vietnam War-era Hamlet Evaluation System reveals that data-derived policies shift weight away from military factors toward economic and political factors, producing more secure regional assessments while preserving interpretability through the use of decision tables.
Abstract
Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.
