Table of Contents
Fetching ...

Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

Prateek Jaiswal, Esmaeil Keyvanshokooh, Junyu Cao

TL;DR

DWTS tackles using confounded observational data to speed up adaptive decisions in precision medicine. It combines a Doubly Debiased LASSO (DDL) for sparse covariate selection with warm-started LinTS priors on the selected features, while treating hidden confounders with uninformative priors. The method is grounded in a structural equation model (SEM) for offline data and a shared-parameter online linear model with parameters $(\theta_a^*, \phi_a^*)$, and is evaluated in synthetic and NHANES-derived virtual environments where it reduces cumulative regret compared to baselines. Limitations include assuming online observability of hidden confounders and linearity, with future work on non-linear models, non-stationary data, and multi-offline-dataset integration.

Abstract

Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach that leverages a Doubly Debiased LASSO (DDL) procedure to identify a sparse set of reliable measured covariates and combines them with key hidden covariates to form a reduced context. By initializing Thompson Sampling (LinTS) priors with DDL-estimated means and variances on these measured features -- while keeping uninformative priors on hidden features -- DWTS effectively harnesses confounded observational data to kick-start adaptive clinical trials. Evaluated on both a purely synthetic environment and a virtual environment created using real cardiovascular risk dataset, DWTS consistently achieves lower cumulative regret than standard LinTS, showing how offline causal insights from observational data can improve trial efficiency and support more personalized treatment decisions.

Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

TL;DR

DWTS tackles using confounded observational data to speed up adaptive decisions in precision medicine. It combines a Doubly Debiased LASSO (DDL) for sparse covariate selection with warm-started LinTS priors on the selected features, while treating hidden confounders with uninformative priors. The method is grounded in a structural equation model (SEM) for offline data and a shared-parameter online linear model with parameters , and is evaluated in synthetic and NHANES-derived virtual environments where it reduces cumulative regret compared to baselines. Limitations include assuming online observability of hidden confounders and linearity, with future work on non-linear models, non-stationary data, and multi-offline-dataset integration.

Abstract

Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach that leverages a Doubly Debiased LASSO (DDL) procedure to identify a sparse set of reliable measured covariates and combines them with key hidden covariates to form a reduced context. By initializing Thompson Sampling (LinTS) priors with DDL-estimated means and variances on these measured features -- while keeping uninformative priors on hidden features -- DWTS effectively harnesses confounded observational data to kick-start adaptive clinical trials. Evaluated on both a purely synthetic environment and a virtual environment created using real cardiovascular risk dataset, DWTS consistently achieves lower cumulative regret than standard LinTS, showing how offline causal insights from observational data can improve trial efficiency and support more personalized treatment decisions.

Paper Structure

This paper contains 13 sections, 5 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Median cumulative regret (solid) with 10%–90% quantile bands (shaded) over $T=1000$ rounds and across $50$ replications. Red: LinTS on all $p+q$ dims. Cyan: LinTS on $p_{\text{true}}+q$ dims. Blue: DWTS warm‑start on $p+q$ dims. Black: OFUL with Partially Observable Offline data tennenholtz2021bandits.
  • Figure 2: Median cumulative regret (solid) with 10%–90% quantile bands (shaded) over $T=50000$ rounds and across $10$ replications. Red: LinTS on all $p+q$ dims. Cyan: LinTS on $p_{\text{true}}+q$ dims. Blue: DWTS warm‑start on $p+q$ dims.

Theorems & Definitions (1)

  • Remark 3.1: The choice of $\kappa_o$