Table of Contents
Fetching ...

Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

Justin Whitehouse, Qizhao Chen, Morgane Austern, Vasilis Syrgkanis

Abstract

Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and $L^1$ calibration error as salient examples.

Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

Abstract

Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and calibration error as salient examples.

Paper Structure

This paper contains 73 sections, 24 theorems, 327 equations, 4 figures, 1 table.

Key Result

Lemma 2.1

Let $\beta > 0$ be a temperature and let $U_1, \dots, U_N$ be random variables. Define the random differences $\Delta_k := \max_\ell U_\ell - U_k \geq 0$. First, one always has Second, if each $\Delta_k$ satisfies Assumption ass:margin with constants $c, H, \delta > 0$, then there exist absolute constants $C, \beta_\ast > 0$ only depending on $c, H,$ and $\delta$ such that

Figures (4)

  • Figure 1: An illustration of the density condition outlined in Assumption \ref{['ass:margin']} alongside a comparison of softmax and softplus approximations for the map $x \mapsto \max\{0,x\}$. Panel (a) illustrates that, when $\delta < 1$, the density may diverge near zero. Panels (b) and (c) illustrate the bias of softplus smoothing when $x = 0$.
  • Figure 2: We plot the empirical coverage of our softmax smoothing estimator under each of the data-generating processes (DGPs) described above. The first three figures correspond to a setting of $\delta^{\mathrm{true}}$ in the Non-Parametric DGP. The final figure corresponds to the semi-parametric DGP, in which $\delta^{\mathrm{true}} = 1.0$. The $x$-axis shows the various values of $\delta^{\mathrm{ass}}$ assumed by the estimator, which is plugged into Equation \ref{['eq:beta_trans']}. The $y$-axis denotes coverage, and the dashed horizontal line corresponds to the target coverage of $95\%$. We plot the mean coverage with $n = 5,000$ samples, and include point-wise valid 95% Wilson confidence intervals.
  • Figure 3: Behavior of function $x \exp\{-x\}$.
  • Figure 4: Causal graphs describing observed data and counterfactual outcomes under alternative policy $\pi$

Theorems & Definitions (50)

  • Lemma 2.1
  • Definition 3.1
  • Proposition 3.2
  • Theorem 3.3
  • proof : Proof Sketch for Theorem \ref{['thm:normal_static']}
  • Corollary 3.4
  • Remark 3.5
  • Proposition 3.6
  • Theorem 3.7: Normality of Structural Parameter
  • Corollary 3.8: Normality of Policy Value Estimate
  • ...and 40 more