Table of Contents
Fetching ...

Efficient and Sharp Off-Policy Learning under Unobserved Confounding

Konstantin Hess, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

Abstract

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.

Efficient and Sharp Off-Policy Learning under Unobserved Confounding

Abstract

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.

Paper Structure

This paper contains 24 sections, 12 theorems, 95 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Proposition 4.1

Let $Q^{+,*}(a,x) = \sup_{\tilde{p}\in \mathcal{P}(\Gamma)}Q(a,x)$ and $Q^{-,*}(a,x) = \inf_{\tilde{p}\in \mathcal{P}(\Gamma)}Q(a,x)$ be the sharp upper and lower bound for the conditional average potential outcome, respectively, given our sensitivity constraints $\mathcal{P}(\Gamma)$. Then, the sha

Figures (4)

  • Figure 1: We can only block backdoor paths for observed confounders $X$. Hence, under unobserved confounding $U$, we cannot point-identify the potential outcome $Y[a]$ and related quantities such as the value function $V(\pi)$.
  • Figure 2: Robustness analysis. We aim to understand how robust our method is against mis-specification in $\Gamma$. We thus set $\Gamma^*=7$ in the data-generating process but use different sensitivity parameters $\Gamma$ in our estimator (i.e., $\Gamma = 7$ is correctly specified, while $\Gamma \neq 7$ is mis-specified). We report the regret over a randomized policy (lower values are better). Clearly, our estimator significantly improves upon the standard DR estimator, even for a completely mis-specified $\Gamma$ (e.g., such as $\Gamma=100$).
  • Figure 3: Property of statistically efficient estimation. We compare our statistically efficient estimator with a simple plug-in estimator of our sharp upper bound from Proposition \ref{['prop:sharp_value']}. For both methods, we report the regret over a randomized policy (lower values are better). Our statistically efficient estimator leads to a lower regret and benefits from increasing sample size due to its optimal estimation properties.
  • Figure 4: Real-world medical data. We compare our statistically efficient estimator against the previous baselines based on data from the International Stroke Trial. Our method yields the best treatment policy and is robust over different $\Gamma$.

Theorems & Definitions (25)

  • Proposition 4.1
  • proof
  • Definition 4.2: Dorn.2022Frauen.2023c
  • Theorem 4.3
  • proof
  • Theorem 4.4
  • proof
  • Corollary 4.5
  • proof
  • Corollary 4.6
  • ...and 15 more