Table of Contents
Fetching ...

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Alexandra Sokolova, Vadim Sokolov

TL;DR

The betting-martingale construction of e-processes for two-arm randomized controlled trials is developed, how e-values naturally handle composite null hypotheses and support futility monitoring is shown, and guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows is provided.

Abstract

Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

TL;DR

The betting-martingale construction of e-processes for two-arm randomized controlled trials is developed, how e-values naturally handle composite null hypotheses and support futility monitoring is shown, and guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows is provided.

Abstract

Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.
Paper Structure (38 sections, 11 equations, 8 figures)

This paper contains 38 sections, 11 equations, 8 figures.

Figures (8)

  • Figure 1: Sample paths of the log-e-process $\log E_n$ for 12 simulated trials under the null (left) and the alternative (right). The horizontal dashed line marks the rejection threshold $\log(1/\alpha)$. Under $H_0$, the bettor's log-wealth drifts downward; under $H_1$, most paths cross the threshold before the maximum sample size.
  • Figure 2: Expected log-growth rate $g(\lambda;\, p_T, p_C)$ as a function of the betting fraction $\lambda$ for three design alternatives ($\delta = p_T - p_C = 0.10, 0.15, 0.20$ with $p_C = 0.30$). The vertical dashed lines mark the GROW-optimal $\lambda^*$ for each alternative.
  • Figure 3: Hybrid monitoring on a single trial under $H_1$. Top panel: the log-e-process with its rejection threshold. Bottom panel: the $z$-statistic at each look with the O'Brien--Fleming boundary. Both streams cross their respective thresholds, but the timing and trajectory differ.
  • Figure 4: E-value vs group sequential power under four monitoring schedules. The e-value threshold is fixed; the GS boundary is recalibrated for each schedule. Under continuous monitoring, the GS power collapses while e-value power increases.
  • Figure 5: Futility monitoring for a trial with a subclinical effect ($p_T = 0.33$, $p_C = 0.30$, $\delta_{\min} = 0.10$). Left: the 95% confidence sequence with the MCID threshold; futility is declared when the upper bound falls below $\delta_{\min}$. Right: the reciprocal e-process testing $H_0'{:}\, \delta \ge \delta_{\min}$; futility is declared when the process crosses $\log(1/\alpha_f)$.
  • ...and 3 more figures