Table of Contents
Fetching ...

Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making

Muhammad Faaiz Taufiq

TL;DR

This thesis tackles robust decision-making under uncertainty by addressing both statistical uncertainty and causal unidentifiability in off-policy evaluation (OPE). It introduces Marginal Density Ratio (MR), a variance-reducing OPE estimator for contextual bandits that focuses on marginal Y shifts rather than joint (X,A,Y) shifts, and provides theoretical and empirical evidence of reduced variance and improved data efficiency. It further proposes Conformal Off-Policy Prediction (COPP), a finite-sample uncertainty quantification method that yields adaptive predictive intervals for outcomes under a target policy, with guarantees under weighted exchangeability. Finally, it develops longitudinal causal bounds for sequential decision settings that remain valid under arbitrary unmeasured confounding and demonstrates their use for falsifying digital twin models in a real-world case study. Collectively, the work advances robust, uncertainty-aware policy evaluation and validation in both static and dynamic decision-making contexts, with direct applicability to medical, recommender-system, and digital twin domains.

Abstract

Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.

Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making

TL;DR

This thesis tackles robust decision-making under uncertainty by addressing both statistical uncertainty and causal unidentifiability in off-policy evaluation (OPE). It introduces Marginal Density Ratio (MR), a variance-reducing OPE estimator for contextual bandits that focuses on marginal Y shifts rather than joint (X,A,Y) shifts, and provides theoretical and empirical evidence of reduced variance and improved data efficiency. It further proposes Conformal Off-Policy Prediction (COPP), a finite-sample uncertainty quantification method that yields adaptive predictive intervals for outcomes under a target policy, with guarantees under weighted exchangeability. Finally, it develops longitudinal causal bounds for sequential decision settings that remain valid under arbitrary unmeasured confounding and demonstrates their use for falsifying digital twin models in a real-world case study. Collectively, the work advances robust, uncertainty-aware policy evaluation and validation in both static and dynamic decision-making contexts, with direct applicability to medical, recommender-system, and digital twin domains.

Abstract

Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.

Paper Structure

This paper contains 255 sections, 305 equations, 42 figures, 21 tables, 3 algorithms.

Figures (42)

  • Figure 1.3.1: The discrepancy between observational data and interventional behaviour in the presence of unmeasured confounding: the range of outcomes observed in the data for patients who were administered the drug (blue) differs from what would be observed if the drug were administered to the general population (red).
  • Figure 2.3.1: Bayesian network corresponding to Assumption \ref{['assum:indep-mips']}.
  • Figure 2.5.1: Results for synthetic data experiment. In \ref{['fig:mse-vs-neval']} we have $\alpha^\ast=0.8$ and in \ref{['fig:mse-vs-betatar']} we have $n = 800$.
  • Figure 2.5.2: Mean squared error of target policy value with standard errors over 10 different seeds for different classification datasets. Here, number of evaluation data $n=1000$, and $\alpha^\ast=0.6$.
  • Figure 3.1.1: Left (a): Conformal Off-Policy Prediction against standard off-policy evaluation methods. Right (b):$90\%$ predictive intervals for $Y$ against $X$ for COPP, competing methods and the oracle.
  • ...and 37 more figures

Theorems & Definitions (25)

  • proof : Proof of Lemma \ref{['lemma:weights-est']}
  • proof : Proof of Proposition \ref{['tv_prop']}
  • proof : Proof of Proposition \ref{['prop:var_mr']}
  • proof : Proof of Proposition \ref{['prop:var_dr']}
  • proof : Proof of Theorem \ref{['prop:mips_main_text']}
  • proof : Proof of Proposition \ref{['prop:bias-and-var-main']}
  • proof : Proof of Proposition \ref{['prop:var_dr_extensions']}
  • proof : Proof of Theorem \ref{['prop:bias-and-var-v3']}
  • proof : Proof of Proposition \ref{['prop:mips_var_reduction']}
  • proof : Proof of Proposition \ref{['prop:mips_generalised']}
  • ...and 15 more