Table of Contents
Fetching ...

Risk-inclusive Contextual Bandits for Early Phase Clinical Trials

Rohit Kanrar, Chunlin Li, Zara Ghodsi, Margaret Gamalo

TL;DR

This work tackles dose-ranging in early-phase clinical trials by cast­ing dose allocation as a contextually informed multi-armed bandit problem that jointly accounts for efficacy and safety. It introduces RiTS, a risk-inclusive Thompson Sampling framework that runs two posterior samplers (one for efficacy and one for safety) and combines them with a weight to guide arm assignment for each participant. To enable valid sequential inference under adaptive data collection, the method employs AsympCS, a time-uniform confidence sequence built on augmented inverse propensity weighted pseudo-outcomes and cross-fitting, which remains valid under potential model mis-specification. Through extensive simulations and a real Phase IIb alopecia areata dataset, RiTS demonstrates safer, covariate-informed dose allocation that effectively identifies the winner dose while controlling cumulative miscoverage, albeit with longer trial durations than fixed randomization. The approach offers a model-assisted, inference-robust pathway to accelerate dose finding and improve trial efficiency while prioritizing participant safety in early-phase development.

Abstract

Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential causal inference over time. The validity of AsympCS is also established in the MAB setup with a possibly mis-specified model. The empirical results demonstrate the strengths of this method in optimizing dose allocation compared to randomized allocations and traditional contextual bandits focused solely on efficacy. Moreover, an application on real data generated from a recent Phase IIb study aligns with actual findings.

Risk-inclusive Contextual Bandits for Early Phase Clinical Trials

TL;DR

This work tackles dose-ranging in early-phase clinical trials by cast­ing dose allocation as a contextually informed multi-armed bandit problem that jointly accounts for efficacy and safety. It introduces RiTS, a risk-inclusive Thompson Sampling framework that runs two posterior samplers (one for efficacy and one for safety) and combines them with a weight to guide arm assignment for each participant. To enable valid sequential inference under adaptive data collection, the method employs AsympCS, a time-uniform confidence sequence built on augmented inverse propensity weighted pseudo-outcomes and cross-fitting, which remains valid under potential model mis-specification. Through extensive simulations and a real Phase IIb alopecia areata dataset, RiTS demonstrates safer, covariate-informed dose allocation that effectively identifies the winner dose while controlling cumulative miscoverage, albeit with longer trial durations than fixed randomization. The approach offers a model-assisted, inference-robust pathway to accelerate dose finding and improve trial efficiency while prioritizing participant safety in early-phase development.

Abstract

Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential causal inference over time. The validity of AsympCS is also established in the MAB setup with a possibly mis-specified model. The empirical results demonstrate the strengths of this method in optimizing dose allocation compared to randomized allocations and traditional contextual bandits focused solely on efficacy. Moreover, an application on real data generated from a recent Phase IIb study aligns with actual findings.

Paper Structure

This paper contains 53 sections, 7 theorems, 66 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

For Algorithm alg:prop_algo, the following statements are true:

Figures (8)

  • Figure 1: Study design of a Phase II/III clinical trial. In this scenario, Dose 5 of a novel drug, demonstrating preliminary safety and early efficacy in Phase I and IIa, progresses to Phase IIb. Then, five doses and a placebo control are randomized in Phase IIb to facilitate dose-response curve characterization and optimal dose selection. Final analyses are performed at the end of Phase IIb to select the most suitable dose for the larger Phase III trial.
  • Figure 2: Box plots of cumulative regret based on three different criteria: utility, efficacy, and safety, at different stages of trials across $n_{sim}$ replications. Panels are arranged in columns based on three criteria, and two different rows correspond to the High-SNR and Low-SNR setups. The y-limits are set at different ranges across two different rows in order to compare cumulative regrets within each data-generating setup. A lower value of regret indicates better performance.
  • Figure 3: Left: Box plot for the frequency of arm allocations by different methods across $n_{sim}$ replications. Right: Panels in the first row present the proportion of replication where the stopping criteria are met across different stages of the trial. Panels in the second row depict the proportion replications where Arm 4 has the highest estimated effect size.
  • Figure 4: Left panel: Box plot for the frequency of arm allocations by different methods across $1000$ replications. Right panel: Left sub-figure presents the proportion of replication where the stopping criteria are met across different stages of the trial, with the minimum clinically significant effect size set to 0.1. The right sub-figure depicts the proportion of replications where Arm 6 has the highest estimated effect size.
  • Figure 5: Box-plots of width for confidence intervals at different stages of the trial across 1000 replications. The top and bottom rows correspond to 'High-SNR' and 'Low-SNR' data-generating mechanisms, respectively. Three different columns of panels correspond to effect sizes of three active doses versus placebo, i.e., $\{\Delta(a)\}_{a=2}^{4}$.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1
  • Theorem 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6