Table of Contents
Fetching ...

Evaluating the Effectiveness of Index-Based Treatment Allocation

Niclas Boehmer, Yash Nair, Sanket Shah, Lucas Janson, Aparna Taneja, Milind Tambe

TL;DR

This work tackles the challenge of evaluating index-based treatment allocations under resource scarcity by introducing a subgroup estimator that isolates the causal effect on policy-selected individuals. It establishes asymptotically valid inference procedures, including confidence intervals and p-values, for the subgroup estimator and provides a complementary base-estimator analysis via empirical process theory. Through synthetic and real-world simulations, the authors show that the subgroup approach achieves substantially higher statistical power and tighter confidence intervals than the traditional base estimator, enabling earlier and more reliable conclusions about policy effectiveness. The methodology is extended to handle covariate adjustment, sequential (multi-step) allocation, and post-hoc reevaluation of past RCTs, with empirical validation on mHealth deployments (e.g., mMitra) and restless bandit-like settings, demonstrating practical impact for policy evaluation and decision-making in resource-constrained environments.

Abstract

When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.

Evaluating the Effectiveness of Index-Based Treatment Allocation

TL;DR

This work tackles the challenge of evaluating index-based treatment allocations under resource scarcity by introducing a subgroup estimator that isolates the causal effect on policy-selected individuals. It establishes asymptotically valid inference procedures, including confidence intervals and p-values, for the subgroup estimator and provides a complementary base-estimator analysis via empirical process theory. Through synthetic and real-world simulations, the authors show that the subgroup approach achieves substantially higher statistical power and tighter confidence intervals than the traditional base estimator, enabling earlier and more reliable conclusions about policy effectiveness. The methodology is extended to handle covariate adjustment, sequential (multi-step) allocation, and post-hoc reevaluation of past RCTs, with empirical validation on mHealth deployments (e.g., mMitra) and restless bandit-like settings, demonstrating practical impact for policy evaluation and decision-making in resource-constrained environments.

Abstract

When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.
Paper Structure (70 sections, 24 theorems, 116 equations, 8 figures, 3 tables)

This paper contains 70 sections, 24 theorems, 116 equations, 8 figures, 3 tables.

Key Result

Lemma 3.1

Under very mild assumptions, $\lim_{n\to \infty}\sqrt{n}\left( \tau^{\mathrm{q}}_{\alpha}(\Upsilon)- \mathbb{E}[\theta^{\mathrm{SG}}_{n,\alpha}(\pi^{\Upsilon})] \right) = 0$.

Figures (8)

  • Figure 1: A representative example of the size of confidence intervals. We compare different estimators for the effectiveness of the Whittle policy (blue) and the random policy (orange). The $x$-axis shows the average effect of a treatment. Vertical lines show the estimand and a zero treatment effect. For each estimator, we show their point estimate as a dot and their confidence interval as a line.
  • Figure 2: Evaluation of RCT from verma2023restless. We show estimators' point estimates as a dot and $95\%$-confidence intervals as a line for different evaluation horizons with and without correcting for covariates. "Subgroup (First $x$ weeks)" refers to our subgroup estimator applied to all agents that (would) have been allocated a treatment up until week $x$.
  • Figure 3: Distribution of the value of different estimators for $100000$ RCTs. The estimand is $1$ by construction. Horizontal lines indicate one standard deviation below and above the mean.
  • Figure 4: Confidence Intervals created by the Subgroup Estimator for 100 different simulations.
  • Figure 5: Empirical comparison of the confidence intervals produced by different estimators when varying the intervention effect for the synthetic and TB domain, where we generate intervention effects randomly. In particular, for both domains, we adjust the sampling so that the maximum intervention effect, i.e., the difference between the transition probability under passive and active action, is at most the value depicted on the $x$-axis. On the left, we analyze validity by showing the fraction of times the estimand falls in an estimator's $95\%$ confidence interval (the closer to $95\%$ the better). On the right, we analyze the power of estimators by depicting the half-width of computed confidence intervals (the smaller the better).
  • ...and 3 more figures

Theorems & Definitions (33)

  • Lemma 3.1: informal corollary of Lemma S2 in imai2023statistical
  • Theorem 3.2: informal corollary of Theorem 2 in imai2023statistical
  • Theorem 4.1
  • Theorem 4.2: informal
  • Theorem A.1: informal
  • Theorem E.3
  • Lemma E.4
  • Lemma E.4
  • Definition E.5: VC dimension
  • Definition E.6: Subgraph of a function; cf Page 141 of vaart2023empirical
  • ...and 23 more