Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making
Muhammad Faaiz Taufiq
TL;DR
This thesis tackles robust decision-making under uncertainty by addressing both statistical uncertainty and causal unidentifiability in off-policy evaluation (OPE). It introduces Marginal Density Ratio (MR), a variance-reducing OPE estimator for contextual bandits that focuses on marginal Y shifts rather than joint (X,A,Y) shifts, and provides theoretical and empirical evidence of reduced variance and improved data efficiency. It further proposes Conformal Off-Policy Prediction (COPP), a finite-sample uncertainty quantification method that yields adaptive predictive intervals for outcomes under a target policy, with guarantees under weighted exchangeability. Finally, it develops longitudinal causal bounds for sequential decision settings that remain valid under arbitrary unmeasured confounding and demonstrates their use for falsifying digital twin models in a real-world case study. Collectively, the work advances robust, uncertainty-aware policy evaluation and validation in both static and dynamic decision-making contexts, with direct applicability to medical, recommender-system, and digital twin domains.
Abstract
Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.
