Table of Contents
Fetching ...

Externally Valid Policy Evaluation Combining Trial and Observational Data

Sofia Ek, Dave Zachariah

Abstract

Randomized trials are widely considered as the gold standard for evaluating the effects of decision policies. Trial data is, however, drawn from a population which may differ from the intended target population and this raises a problem of external validity (aka. generalizability). In this paper we seek to use trial data to draw valid inferences about the outcome of a policy on the target population. Additional covariate data from the target population is used to model the sampling of individuals in the trial study. We develop a method that yields certifiably valid trial-based policy evaluations under any specified range of model miscalibrations. The method is nonparametric and the validity is assured even with finite samples. The certified policy evaluations are illustrated using both simulated and real data.

Externally Valid Policy Evaluation Combining Trial and Observational Data

Abstract

Randomized trials are widely considered as the gold standard for evaluating the effects of decision policies. Trial data is, however, drawn from a population which may differ from the intended target population and this raises a problem of external validity (aka. generalizability). In this paper we seek to use trial data to draw valid inferences about the outcome of a policy on the target population. Additional covariate data from the target population is used to model the sampling of individuals in the trial study. We develop a method that yields certifiably valid trial-based policy evaluations under any specified range of model miscalibrations. The method is nonparametric and the validity is assured even with finite samples. The certified policy evaluations are illustrated using both simulated and real data.
Paper Structure (15 sections, 1 theorem, 38 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 1 theorem, 38 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Theorem 4.1

For any odds miscalibration up to degree $\Gamma$, is an externally valid limit on the out-of-sample loss $L_{n+1}$ of policy $\pi$. That is, equation eq:ell_alpha is certified to satisfy equation eq:certificate.

Figures (12)

  • Figure 1: Inferring the out-of-sample losses of a policy $\pi$. (a) The loss $L$ is bounded by an upper limit $\ell_{\alpha}$ with a probability of at least $1-\alpha$. The rct-based limit curve uses only trial data, whereas the other limit curves also utilize a sampling model trained using additional covariate data $X$ from the target population. Each limit curve in blue is certified to provide valid inferences for models miscalibrated up to a degree $\Gamma$ defined in equation \ref{['eq:Gamma_divergence']}. (b) Gap between the actual probability of exceeding the limit, $L > \ell_{\alpha}$, and the nominal probability of miscoverage $\alpha$. A negative gap means the inference $\ell_{\alpha}$ is invalid, while a positive gap implies it is valid but conservative. Details of the experiment are presented in Section \ref{['sec:synthetic-data']}.
  • Figure 2: Causal structure of process (a) under policy $\pi$ as well as (b) the trial study. Sampling indicator $S$ distinguishes between the two. For the important case of rct, assignment of $A$ is not influenced by any covariates so that the path $X \rightarrow A$ is eliminated.
  • Figure 3: (a) Omitting measured selection factors to benchmark credible values for $\Gamma$ in equation \ref{['eq:Gamma_divergence']}. (b) Inferred blood mercury levels [$\mu$g/L] in a target population under 'high' and 'low' seafood consumption ($\pi_1$ and $\pi_0$, respectively). Limit curves for degrees of odds miscalibration $\Gamma \in [1,2]$.
  • Figure 4: Reliability diagram of the observed odds against the average predicted nominal odds obtained from models $\widehat{p}(S|X)$.
  • Figure 5: Odds $p(S=0|X)/p(S=1|X)$ compared with nominal odds obtained from logistic and xgboost models $\widehat{p}(S|X)$. The dots are a random subsample of the trial samples.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Theorem 4.1