Predictive Performance Comparison of Decision Policies Under Confounding
Luke Guerdan, Amanda Coston, Kenneth Holstein, Zhiwei Steven Wu
TL;DR
This paper tackles the problem of comparing predictive decision policies to a status quo in the presence of unobserved confounding, focusing on pre-deployment evaluation rather than post-hoc assessment. It develops a partial-identification framework that localizes confounding-induced uncertainty to the policy disagreement region and introduces a novel $\delta$-regret interval to bound the difference in policy performance more tightly than traditional baselines. By linking a range of modern causal identification assumptions (e.g., instrumental variables, marginal sensitivity models, proximal variables) to pointwise bounding functions, the authors provide a flexible method to estimate finite-sample regret bounds via plug-in and doubly robust estimators with cross-fitting. The approach is validated on synthetic data under MSM and IV scenarios and demonstrated on a real healthcare enrollment setting, where it can yield more decisive pre-deployment conclusions than existing non-comparative OPE methods. Overall, the framework advances confounding-robust, pre-deployment policy evaluation by delivering informative, assumption-tunable regret bounds that focus on the most informative regions of the action space.
Abstract
Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by making strong assumptions about the data-generating mechanism. In this work, we propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches from the causal inference and off-policy evaluation literatures (e.g., instrumental variable, marginal sensitivity model, proximal variable). Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison. We develop a practical approach for finite-sample estimation of regret intervals under no assumptions on the parametric form of the status quo policy. We verify our framework theoretically and via synthetic data experiments. We conclude with a real-world application using our framework to support a pre-deployment evaluation of a proposed modification to a healthcare enrollment policy.
