Table of Contents
Fetching ...

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data

Cheuk Hang Leung, Yiyan Huang, Yijun Li, Qi Wu

TL;DR

This work tackles offline policy evaluation and learning with continuous treatments in the presence of distribution shifts by formulating a distributionally robust optimization (DRO) framework with a KL-divergence ambiguity set. The authors develop kernel-based inverse probability weighting (IPW) estimators and a dual reformulation to compute a tractable distributionally robust objective $Q_{DRO}^{h}(\pi)$ that converges to the true DRO value as the bandwidth $h\to0$. They establish finite-sample and asymptotic properties for the estimators, derive an algorithm to learn a distributionally robust policy $\hat{\pi}_{DRO}^{h}$, and provide regret bounds via Rademacher complexity. Empirical studies, including a Warfarin dose case, demonstrate robustness to distribution shifts and improved worst-case performance compared to nonrobust methods, highlighting the method’s practical relevance for continuous interventions in real-world settings.

Abstract

Using offline observational data for policy evaluation and learning allows decision-makers to evaluate and learn a policy that connects characteristics and interventions. Most existing literature has focused on either discrete treatment spaces or assumed no difference in the distributions between the policy-learning and policy-deployed environments. These restrict applications in many real-world scenarios where distribution shifts are present with continuous treatment. To overcome these challenges, this paper focuses on developing a distributionally robust policy under a continuous treatment setting. The proposed distributionally robust estimators are established using the Inverse Probability Weighting (IPW) method extended from the discrete one for policy evaluation and learning under continuous treatments. Specifically, we introduce a kernel function into the proposed IPW estimator to mitigate the exclusion of observations that can occur in the standard IPW method to continuous treatments. We then provide finite-sample analysis that guarantees the convergence of the proposed distributionally robust policy evaluation and learning estimators. The comprehensive experiments further verify the effectiveness of our approach when distribution shifts are present.

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data

TL;DR

This work tackles offline policy evaluation and learning with continuous treatments in the presence of distribution shifts by formulating a distributionally robust optimization (DRO) framework with a KL-divergence ambiguity set. The authors develop kernel-based inverse probability weighting (IPW) estimators and a dual reformulation to compute a tractable distributionally robust objective that converges to the true DRO value as the bandwidth . They establish finite-sample and asymptotic properties for the estimators, derive an algorithm to learn a distributionally robust policy , and provide regret bounds via Rademacher complexity. Empirical studies, including a Warfarin dose case, demonstrate robustness to distribution shifts and improved worst-case performance compared to nonrobust methods, highlighting the method’s practical relevance for continuous interventions in real-world settings.

Abstract

Using offline observational data for policy evaluation and learning allows decision-makers to evaluate and learn a policy that connects characteristics and interventions. Most existing literature has focused on either discrete treatment spaces or assumed no difference in the distributions between the policy-learning and policy-deployed environments. These restrict applications in many real-world scenarios where distribution shifts are present with continuous treatment. To overcome these challenges, this paper focuses on developing a distributionally robust policy under a continuous treatment setting. The proposed distributionally robust estimators are established using the Inverse Probability Weighting (IPW) method extended from the discrete one for policy evaluation and learning under continuous treatments. Specifically, we introduce a kernel function into the proposed IPW estimator to mitigate the exclusion of observations that can occur in the standard IPW method to continuous treatments. We then provide finite-sample analysis that guarantees the convergence of the proposed distributionally robust policy evaluation and learning estimators. The comprehensive experiments further verify the effectiveness of our approach when distribution shifts are present.
Paper Structure (36 sections, 14 theorems, 147 equations, 4 tables, 2 algorithms)

This paper contains 36 sections, 14 theorems, 147 equations, 4 tables, 2 algorithms.

Key Result

Lemma 1

Under Assumptions ass:consistency - ass:positivity, we have for any $\alpha\geq 0$, where $\delta(\cdot)$ is the Dirac Delta function$\delta(x)=$ such that i) $\int_{\mathbb{R}}\delta(x)dx=1$ and ii) $\int_{\mathbb{R}}\delta(x)f(x)dx=f(0)$ for any arbitrary $f$ defined on $\mathbb{R}$..

Theorems & Definitions (25)

  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Definition 1
  • Definition 2
  • Theorem 3
  • Corollary 4
  • Proposition 1
  • proof : Proof of Claim \ref{['result:convergence S_N_h']}.
  • Proposition 2
  • ...and 15 more