Risk-inclusive Contextual Bandits for Early Phase Clinical Trials
Rohit Kanrar, Chunlin Li, Zara Ghodsi, Margaret Gamalo
TL;DR
This work tackles dose-ranging in early-phase clinical trials by casting dose allocation as a contextually informed multi-armed bandit problem that jointly accounts for efficacy and safety. It introduces RiTS, a risk-inclusive Thompson Sampling framework that runs two posterior samplers (one for efficacy and one for safety) and combines them with a weight to guide arm assignment for each participant. To enable valid sequential inference under adaptive data collection, the method employs AsympCS, a time-uniform confidence sequence built on augmented inverse propensity weighted pseudo-outcomes and cross-fitting, which remains valid under potential model mis-specification. Through extensive simulations and a real Phase IIb alopecia areata dataset, RiTS demonstrates safer, covariate-informed dose allocation that effectively identifies the winner dose while controlling cumulative miscoverage, albeit with longer trial durations than fixed randomization. The approach offers a model-assisted, inference-robust pathway to accelerate dose finding and improve trial efficiency while prioritizing participant safety in early-phase development.
Abstract
Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential causal inference over time. The validity of AsympCS is also established in the MAB setup with a possibly mis-specified model. The empirical results demonstrate the strengths of this method in optimizing dose allocation compared to randomized allocations and traditional contextual bandits focused solely on efficacy. Moreover, an application on real data generated from a recent Phase IIb study aligns with actual findings.
