Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data
Cheuk Hang Leung, Yiyan Huang, Yijun Li, Qi Wu
TL;DR
This work tackles offline policy evaluation and learning with continuous treatments in the presence of distribution shifts by formulating a distributionally robust optimization (DRO) framework with a KL-divergence ambiguity set. The authors develop kernel-based inverse probability weighting (IPW) estimators and a dual reformulation to compute a tractable distributionally robust objective $Q_{DRO}^{h}(\pi)$ that converges to the true DRO value as the bandwidth $h\to0$. They establish finite-sample and asymptotic properties for the estimators, derive an algorithm to learn a distributionally robust policy $\hat{\pi}_{DRO}^{h}$, and provide regret bounds via Rademacher complexity. Empirical studies, including a Warfarin dose case, demonstrate robustness to distribution shifts and improved worst-case performance compared to nonrobust methods, highlighting the method’s practical relevance for continuous interventions in real-world settings.
Abstract
Using offline observational data for policy evaluation and learning allows decision-makers to evaluate and learn a policy that connects characteristics and interventions. Most existing literature has focused on either discrete treatment spaces or assumed no difference in the distributions between the policy-learning and policy-deployed environments. These restrict applications in many real-world scenarios where distribution shifts are present with continuous treatment. To overcome these challenges, this paper focuses on developing a distributionally robust policy under a continuous treatment setting. The proposed distributionally robust estimators are established using the Inverse Probability Weighting (IPW) method extended from the discrete one for policy evaluation and learning under continuous treatments. Specifically, we introduce a kernel function into the proposed IPW estimator to mitigate the exclusion of observations that can occur in the standard IPW method to continuous treatments. We then provide finite-sample analysis that guarantees the convergence of the proposed distributionally robust policy evaluation and learning estimators. The comprehensive experiments further verify the effectiveness of our approach when distribution shifts are present.
