Policy Learning with Confidence
Victor Chernozhukov, Sokbae Lee, Adam M. Rosen, Liyang Sun
TL;DR
The paper tackles policy selection when welfare estimates are noisy by introducing risk-aware policy learning, which balances estimated welfare $\widehat{V}(\pi)$ against estimation risk $\widehat{s}(\pi)$ along an efficient frontier. The PoLeCe rule selects $\widehat{\pi}_{PoLeCe}$ by maximizing $\widehat{V}(\pi) - \widehat{q}_{1-\alpha,\Pi}\widehat{s}(\pi)$, where $\widehat{q}_{1-\alpha,\Pi}$ is a bootstrap-based quantile that yields a high-probability, one-sided lower bound on welfare, $LV_{1-\alpha}(\pi) = \widehat{V}(\pi) - \widehat{q}_{1-\alpha,\Pi}\widehat{s}(\pi)$. The framework provides regret bounds for risk-aware rules and shows that a data-driven penalty can place the chosen policy on the efficient frontier with reporting guarantees, while enabling computation via second-order cone programs. An empirical MVPF illustration using the Policy Impacts Library demonstrates how PoLeCe diversifies allocations across programs in light of estimation uncertainty, with allocations differing across adult and youth groups. The work situates PoLeCe within P-certified decisions and highlights its practical relevance for transparent, uncertainty-aware budgeting and policy design, while suggesting avenues for extending the approach to ambiguity and partial identification.
Abstract
This paper introduces a rule for policy selection in the presence of estimation uncertainty, explicitly accounting for estimation risk. The rule belongs to the class of risk-aware rules on the efficient decision frontier, characterized as policies offering maximal estimated welfare for a given level of estimation risk. Among this class, the proposed rule is chosen to provide a reporting guarantee, ensuring that the welfare delivered exceeds a threshold with a pre-specified confidence level. We apply this approach to the allocation of a limited budget among social programs using estimates of their marginal value of public funds and associated standard errors.
