Evaluating the Effectiveness of Index-Based Treatment Allocation
Niclas Boehmer, Yash Nair, Sanket Shah, Lucas Janson, Aparna Taneja, Milind Tambe
TL;DR
This work tackles the challenge of evaluating index-based treatment allocations under resource scarcity by introducing a subgroup estimator that isolates the causal effect on policy-selected individuals. It establishes asymptotically valid inference procedures, including confidence intervals and p-values, for the subgroup estimator and provides a complementary base-estimator analysis via empirical process theory. Through synthetic and real-world simulations, the authors show that the subgroup approach achieves substantially higher statistical power and tighter confidence intervals than the traditional base estimator, enabling earlier and more reliable conclusions about policy effectiveness. The methodology is extended to handle covariate adjustment, sequential (multi-step) allocation, and post-hoc reevaluation of past RCTs, with empirical validation on mHealth deployments (e.g., mMitra) and restless bandit-like settings, demonstrating practical impact for policy evaluation and decision-making in resource-constrained environments.
Abstract
When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.
