E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Alexandra Sokolova; Vadim Sokolov

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Alexandra Sokolova, Vadim Sokolov

TL;DR

The betting-martingale construction of e-processes for two-arm randomized controlled trials is developed, how e-values naturally handle composite null hypotheses and support futility monitoring is shown, and guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows is provided.

Abstract

Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

TL;DR

Abstract

Paper Structure (38 sections, 11 equations, 8 figures)

This paper contains 38 sections, 11 equations, 8 figures.

Introduction
Practitioner guidance: when to use e-values in clinical trials
Futility monitoring and integration with Bayesian workflows
Software, regulatory landscape, and common pitfalls
Practical recipe: implementing an e-value monitoring plan
Background: group sequential monitoring in confirmatory trials
E-values and e-processes: a mini-tutorial
What is an e-value?
Finite-sample validity
Connecting to concepts you already use
How it works: the betting interpretation
A concrete example
The e-value as a sequential test statistic: e-processes
Choosing the betting fraction: the role of the design alternative
Relationship to p-values and Bayes factors
...and 23 more sections

Figures (8)

Figure 1: Sample paths of the log-e-process $\log E_n$ for 12 simulated trials under the null (left) and the alternative (right). The horizontal dashed line marks the rejection threshold $\log(1/\alpha)$. Under $H_0$, the bettor's log-wealth drifts downward; under $H_1$, most paths cross the threshold before the maximum sample size.
Figure 2: Expected log-growth rate $g(\lambda;\, p_T, p_C)$ as a function of the betting fraction $\lambda$ for three design alternatives ($\delta = p_T - p_C = 0.10, 0.15, 0.20$ with $p_C = 0.30$). The vertical dashed lines mark the GROW-optimal $\lambda^*$ for each alternative.
Figure 3: Hybrid monitoring on a single trial under $H_1$. Top panel: the log-e-process with its rejection threshold. Bottom panel: the $z$-statistic at each look with the O'Brien--Fleming boundary. Both streams cross their respective thresholds, but the timing and trajectory differ.
Figure 4: E-value vs group sequential power under four monitoring schedules. The e-value threshold is fixed; the GS boundary is recalibrated for each schedule. Under continuous monitoring, the GS power collapses while e-value power increases.
Figure 5: Futility monitoring for a trial with a subclinical effect ($p_T = 0.33$, $p_C = 0.30$, $\delta_{\min} = 0.10$). Left: the 95% confidence sequence with the MCID threshold; futility is declared when the upper bound falls below $\delta_{\min}$. Right: the reciprocal e-process testing $H_0'{:}\, \delta \ge \delta_{\min}$; futility is declared when the process crosses $\log(1/\alpha_f)$.
...and 3 more figures

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

TL;DR

Abstract

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Authors

TL;DR

Abstract

Table of Contents

Figures (8)