Informed Burn-In Decisions in RAR: Harmonizing Adaptivity and Inferential Precision Based on Study Setting

Lukas Pin; Stef Baas; Gianmarco Caruso; David S. Robertson; Sofía S. Villar

Informed Burn-In Decisions in RAR: Harmonizing Adaptivity and Inferential Precision Based on Study Setting

Lukas Pin, Stef Baas, Gianmarco Caruso, David S. Robertson, Sofía S. Villar

TL;DR

This paper tackles the lack of systematic guidance for burn-in length in response-adaptive randomization (RAR) by introducing the first principled framework that combines problem difficulty, design reactiveness, and expected final allocation error into a single burn-in formula. The authors define two novel metrics, the reactiveness parameter $r$ and the expected final allocation error $\\epsilon_\\rho$, along with the standardized treatment effect $\\delta$, to quantify exploration-exploitation trade-offs and stability. The core contribution is a practical rule, $b = \max\left\{ 2, \left\lfloor 0.5 \cdot \frac{n \cdot n_{1/2}}{n+n_{1/2}} \cdot (r+\\epsilon_\\rho)^{\\delta} \right\rfloor \right\}$, that determines burn-in length and adapts to trial size and design risk. Simulation results on ARREST and CALISTO demonstrate that this framework stabilizes inference (reducing Type-I error inflation and MSE) while preserving or improving patient benefit and final power, outperforming fixed burn-in heuristics, and offering a data-driven tool for practitioners.

Abstract

Response-Adaptive Randomization (RAR) is recognized for its potential to deliver improvements in patient benefit. However, the utility of RAR is contingent on regularization methods to mitigate early instability and preserve statistical integrity. A standard regularization approach is the ''burn-in'' period, an initial phase of equal randomization before treatment allocation adapts based on accrued data. The length of this burn-in is a critical design parameter, yet its selection remains unsystematic and improvised, as no established guideline exists. A poorly chosen length poses significant risks: one that is too short leads to high estimation bias and type-I error rate inflation, while one that is too long impedes the intended patient and power benefits of using adaptation. The challenge of selecting the burn-in generalizes to a fundamental question: what is the statistically appropriate timing for the first adaptation? We introduce the first systematic framework for determining burn-in length. This framework synthesizes core factors - total sample size, problem difficulty, and two novel metrics (reactivity and expected final allocation error) - into a single, principled formula. Simulation studies, grounded in real-world designs, demonstrate that lengths derived from our formula successfully stabilize the trial. The formula identifies a ''sweet spot'' that mitigates type-I error rate inflation and mean-squared error, preserving the advantages of higher power and patient benefit. This framework moves researchers from conjecture toward a systematic, reliable approach.

Informed Burn-In Decisions in RAR: Harmonizing Adaptivity and Inferential Precision Based on Study Setting

TL;DR

Abstract

Informed Burn-In Decisions in RAR: Harmonizing Adaptivity and Inferential Precision Based on Study Setting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)