Table of Contents
Fetching ...

Shrinkage Methods for Treatment Choice

Takuya Ishihara, Daisuke Kurisu

TL;DR

The paper addresses treatment choice with covariates by proposing a shrinkage rule that shrinks CATE estimates toward their mean and selects shrinkage factors by minimizing an upper bound on the maximum regret under a bounded CATE space $\Theta(\kappa)$. The approach unifies the conditional empirical success (CES) rule and pooling as special cases, and it yields computationally tractable shrinkage factors $w_k^{*}(\kappa)$ through a per-coordinate optimization of $\psi_k(w_k;\kappa)$ involving the function $\eta(\cdot)$. Theoretical results show the shrinkage rule can outperform CES and pooling when $\kappa$ is moderately large or small (depending on variance heterogeneity), with bounds close to optimal and robustness to misspecification; numerical experiments and an empirical JTPA application illustrate when and how shrinkage changes treatment decisions relative to CES. Overall, the shrinkage framework provides a flexible, data-driven means to improve worst-case welfare guarantees in heterogeneous populations and remains practical for larger problem sizes.

Abstract

This study examines the problem of determining whether to treat individuals based on observed covariates. The most common decision rule is the conditional empirical success (CES) rule proposed by Manski (2004), which assigns individuals to treatments that yield the best experimental outcomes conditional on the observed covariates. Conversely, using shrinkage estimators, which shrink unbiased but noisy preliminary estimates toward the average of these estimates, is a common approach in statistical estimation problems because it is well-known that shrinkage estimators may have smaller mean squared errors than unshrunk estimators. Inspired by this idea, we propose a computationally tractable shrinkage rule that selects the shrinkage factor by minimizing an upper bound of the maximum regret. Then, we compare the maximum regret of the proposed shrinkage rule with those of the CES and pooling rules when the space of conditional average treatment effects (CATEs) is correctly specified or misspecified. Our theoretical results demonstrate that the shrinkage rule performs well in many cases and these findings are further supported by numerical experiments. Specifically, we show that the maximum regret of the shrinkage rule can be strictly smaller than those of the CES and pooling rules in certain cases when the space of CATEs is correctly specified. In addition, we find that the shrinkage rule is robust against misspecification of the space of CATEs. Finally, we apply our method to experimental data from the National Job Training Partnership Act Study.

Shrinkage Methods for Treatment Choice

TL;DR

The paper addresses treatment choice with covariates by proposing a shrinkage rule that shrinks CATE estimates toward their mean and selects shrinkage factors by minimizing an upper bound on the maximum regret under a bounded CATE space . The approach unifies the conditional empirical success (CES) rule and pooling as special cases, and it yields computationally tractable shrinkage factors through a per-coordinate optimization of involving the function . Theoretical results show the shrinkage rule can outperform CES and pooling when is moderately large or small (depending on variance heterogeneity), with bounds close to optimal and robustness to misspecification; numerical experiments and an empirical JTPA application illustrate when and how shrinkage changes treatment decisions relative to CES. Overall, the shrinkage framework provides a flexible, data-driven means to improve worst-case welfare guarantees in heterogeneous populations and remains practical for larger problem sizes.

Abstract

This study examines the problem of determining whether to treat individuals based on observed covariates. The most common decision rule is the conditional empirical success (CES) rule proposed by Manski (2004), which assigns individuals to treatments that yield the best experimental outcomes conditional on the observed covariates. Conversely, using shrinkage estimators, which shrink unbiased but noisy preliminary estimates toward the average of these estimates, is a common approach in statistical estimation problems because it is well-known that shrinkage estimators may have smaller mean squared errors than unshrunk estimators. Inspired by this idea, we propose a computationally tractable shrinkage rule that selects the shrinkage factor by minimizing an upper bound of the maximum regret. Then, we compare the maximum regret of the proposed shrinkage rule with those of the CES and pooling rules when the space of conditional average treatment effects (CATEs) is correctly specified or misspecified. Our theoretical results demonstrate that the shrinkage rule performs well in many cases and these findings are further supported by numerical experiments. Specifically, we show that the maximum regret of the shrinkage rule can be strictly smaller than those of the CES and pooling rules in certain cases when the space of CATEs is correctly specified. In addition, we find that the shrinkage rule is robust against misspecification of the space of CATEs. Finally, we apply our method to experimental data from the National Job Training Partnership Act Study.
Paper Structure (16 sections, 12 theorems, 130 equations, 15 figures, 1 table)

This paper contains 16 sections, 12 theorems, 130 equations, 15 figures, 1 table.

Key Result

Proposition 1

Suppose that $\sigma_1 = \cdots = \sigma_K$ and $p_1 = \cdots = p_K$. For all $w \in [0,1]$, we obtain where $L_{\mathrm{true}}(w)$ is defined in Lemma lem:prop1 of Appendix A. Furthermore, $L_{\mathrm{true}}(w) = \overline{R}_{\mathrm{upper}}(w)$ holds when $\kappa = 0$, that is, $\overline{R}_{\mathrm{upper}}(w)$ is equal to $\overline{R}_{\mathrm{true}}(w)$ when $\kappa = 0$.

Figures (15)

  • Figure 1: Functional form of $\eta(a)$. The function $\eta(a)$ is strictly increasing and convex and $\eta(0)$ is approximately equal to 0.17.
  • Figure 2: The solid and dashed lines denote $\overline{R}_{\mathrm{true}}(w)$ and $\overline{R}_{\mathrm{upper}}(w)$, respectively.
  • Figure 7: The solid, dashed, dotted, and dot-dashed lines denote $\overline{R}_{\mathrm{upper}}(w) / \overline{R}_{\mathrm{true}}(w)$ when $K=20$ and $\kappa = 0, \, 0.25, \, 0.5$, and $0.75$.
  • Figure 8: The dashed and dotted lines denote $t^{\ast}(0) \left( 1 - \frac{1}{K} \right)$ and $K^{-1/2} \eta^{-1} \left( 2 \eta(0) \sqrt{K} \right)$, respectively. The solid lines denote the ranges (\ref{['range_kappa']}) for $K=20, \, 50, \, 100$. When $K=20, \, 50, \, 100$, the ranges (\ref{['range_kappa']}) are $(0.602, 0.714)$, $(0.544, 0.737)$, and $(0.503,0.744)$, respectively.
  • Figure 9: The relationship between $\kappa$ and $w^{\ast}(\kappa)$ when $K=2, 5, 100$. The solid, dashed, dotted lines denote the shrinkage factors when $K=2, 5, 100$, respectively. The dot-dashed line denotes the median of $\hat{w}_{\mathrm{JS}}$ in Remark \ref{['rem:shrinkage_est']} when $K=100$ and $\hat{\theta}_k \sim N((-1)^k \kappa, 1)$.
  • ...and 10 more figures

Theorems & Definitions (32)

  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 1
  • Proposition 2
  • Remark 4
  • Remark 5
  • Remark 6
  • Theorem 1
  • Theorem 2
  • ...and 22 more