Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

Brian Cho; Kyra Gan; Nathan Kallus

Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

Brian Cho, Kyra Gan, Nathan Kallus

TL;DR

PEAK delivers a nonparametric, sequential testing framework for composite mean hypotheses across multiple data streams by leveraging $e$-processes and a novel averaging scheme that avoids union bounds. It generalizes a single-stream betting rule to the multi-stream setting, yielding type-I error control and power-one guarantees under mild sampling assumptions, while maintaining computational tractability via convex optimization for region-based hypotheses. The approach supports THR and BAI as convex-region examples and demonstrates substantial practical gains, including up to $85\%$ reduction in samples before stopping and favorable runtimes on real HeartSteps data. These contributions offer a robust, anytime-valid alternative to parametric sequential tests for adaptive experiments in healthcare and related domains.

Abstract

We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoretical guarantees on type-I error control, power, and asymptotic growth rate/$e$-power in the setting of a single data stream; (2) we introduce PEAK, a generalization of this betting scheme to multiple streams, that (i) avoids using wasteful union bounds via averaging, (ii) is a test of power one under mild regularity conditions on the sampling scheme of the streams, and (iii) reduces computational overhead when applying the testing-as-betting approaches for pure-exploration bandit problems. We illustrate the practical benefits of PEAK using both synthetic and real-world HeartSteps datasets. Our experiments show that PEAK provides up to an 85\% reduction in the number of samples before stopping compared to existing stopping rules for pure-exploration bandit problems, and matches the performance of state-of-the-art sequential tests while improving upon computational complexity.

Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

TL;DR

PEAK delivers a nonparametric, sequential testing framework for composite mean hypotheses across multiple data streams by leveraging

-processes and a novel averaging scheme that avoids union bounds. It generalizes a single-stream betting rule to the multi-stream setting, yielding type-I error control and power-one guarantees under mild sampling assumptions, while maintaining computational tractability via convex optimization for region-based hypotheses. The approach supports THR and BAI as convex-region examples and demonstrates substantial practical gains, including up to

reduction in samples before stopping and favorable runtimes on real HeartSteps data. These contributions offer a robust, anytime-valid alternative to parametric sequential tests for adaptive experiments in healthcare and related domains.

Abstract

-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoretical guarantees on type-I error control, power, and asymptotic growth rate/

-power in the setting of a single data stream; (2) we introduce PEAK, a generalization of this betting scheme to multiple streams, that (i) avoids using wasteful union bounds via averaging, (ii) is a test of power one under mild regularity conditions on the sampling scheme of the streams, and (iii) reduces computational overhead when applying the testing-as-betting approaches for pure-exploration bandit problems. We illustrate the practical benefits of PEAK using both synthetic and real-world HeartSteps datasets. Our experiments show that PEAK provides up to an 85\% reduction in the number of samples before stopping compared to existing stopping rules for pure-exploration bandit problems, and matches the performance of state-of-the-art sequential tests while improving upon computational complexity.

Paper Structure (41 sections, 13 theorems, 75 equations, 3 figures, 4 tables)

This paper contains 41 sections, 13 theorems, 75 equations, 3 figures, 4 tables.

Introduction
Problem Formulation and Related Work
$e$-Processes and Testing-by-Betting Framework
Hypothesis Testing of Means for Multiple Streams
Tests for a Single Stream of Data
Asymptotic $e$-Power/Growth Rate
Testing in the Multi-Stream Setting
Why should we average the evidence?
Theoretical Guarantees for Joint Capital Process
Testing Composite Convex Hypotheses
Example 1: THR.
Example 2: BAI.
Empirical Results
Synthetic Experiments
Experiments in the Single-Stream Setting
...and 26 more sections

Key Result

Theorem 1

For all $c \geq 1/4$, $T_t(m, \alpha)$ defines a sequential test with type-I error $\alpha$ for null hypothesis $m$, i.e., if $\mu = m$ (equivalently, $P_1\in\mathcal{P}(m)$), $\mathbb{P}(\exists t \in \mathbb{N}: T_t(m) = 1) \leq \alpha$.

Figures (3)

Figure 1: Growth rate visualization for different ground truth $\mu$ and hypothesis $m$ combinations under Bernoulli distributions, $P \equiv \text{Bern}(\mu)$, and $c=0.26$. Left: Asymptotic growth rate of $K_t(m)$, $G(c, m , P)$, for null hypothesis $m$, with darker colors representing larger growth rates. Right: The ratio of $G(c, m, P)$ to the best achievable growth rate, $f(c, m , P)$, for null hypothesis $m$, with darker colors representing larger ratios.
Figure 2: Log-widths of the confidence sequences across 5000 time steps. Curves represent the average width across 30 simulations.
Figure 2.2: Visualization of Minima within each region at time $t$, obtained by projecting the global minimizer of $E_t(m)$ onto the regions implied by THR (left) and BAI (right) for $W=2$ case. In both plots, the current global minima at time $t$ is contained in Region 1, and projected to obtain the minima in all other regions.

Theorems & Definitions (29)

Definition 1: $e$-process, Definition 1 of grünwald2023safe
Definition 2: $e$-power
Definition 3: Single-Arm Capital Process
Definition 4
Theorem 1
Theorem 2
Proposition 1
Remark 1
Definition 5
Lemma 1: Asymptotic Growth Rate as a Function of $c$
...and 19 more

Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

TL;DR

Abstract

Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (29)