Table of Contents
Fetching ...

Hypothesis testing with e-values

Aaditya Ramdas, Ruodu Wang

TL;DR

This work consolidates e-values as a unifying framework for hypothesis testing, linking them to p-values via calibrators and establishing both validity under the null and efficiency under alternatives. It develops foundational theory (e-values, e-processes, and calibration) and practical machinery (universal inference, mixture/plug-in methods, and post-hoc decision rules) to handle irregular models and sequential, anytime-valid settings. The text further explores multiple testing, confidence sequences, and risk-aware decision making, showing how e-values enable robust, data-adaptive inference with strong theoretical guarantees. Collectively, it provides a comprehensive toolkit—spanning theory, methodology, and numerical illustrations—for reliable and reproducible statistical inference across diverse settings.

Abstract

This book is written to offer a humble, but unified, treatment of e-values in hypothesis testing. It is organized into three parts: Fundamental Concepts, Core Ideas, and Advanced Topics. The first part includes four chapters that introduce the basic concepts. The second part includes five chapters of core ideas such as universal inference, log-optimality, e-processes, operations on e-values, and e-values in multiple testing. The third part contains seven chapters of advanced topics. The book collates important results from a variety of modern papers on e-values and related concepts, and also contains many results not published elsewhere. It offers a coherent and comprehensive picture on a fast-growing research area, and is ready to use as the basis of a graduate course in statistics and related fields.

Hypothesis testing with e-values

TL;DR

This work consolidates e-values as a unifying framework for hypothesis testing, linking them to p-values via calibrators and establishing both validity under the null and efficiency under alternatives. It develops foundational theory (e-values, e-processes, and calibration) and practical machinery (universal inference, mixture/plug-in methods, and post-hoc decision rules) to handle irregular models and sequential, anytime-valid settings. The text further explores multiple testing, confidence sequences, and risk-aware decision making, showing how e-values enable robust, data-adaptive inference with strong theoretical guarantees. Collectively, it provides a comprehensive toolkit—spanning theory, methodology, and numerical illustrations—for reliable and reproducible statistical inference across diverse settings.

Abstract

This book is written to offer a humble, but unified, treatment of e-values in hypothesis testing. It is organized into three parts: Fundamental Concepts, Core Ideas, and Advanced Topics. The first part includes four chapters that introduce the basic concepts. The second part includes five chapters of core ideas such as universal inference, log-optimality, e-processes, operations on e-values, and e-values in multiple testing. The third part contains seven chapters of advanced topics. The book collates important results from a variety of modern papers on e-values and related concepts, and also contains many results not published elsewhere. It offers a coherent and comprehensive picture on a fast-growing research area, and is ready to use as the basis of a graduate course in statistics and related fields.

Paper Structure

This paper contains 184 sections, 619 equations, 22 figures, 6 tables, 4 algorithms.

Figures (22)

  • Figure 1.5: A few ways of constructing e-values from likelihood ratio processes. Left: one run. Right: the average of 1000 runs (with average taken on the log values).
  • Figure 2.1: A comparison of e-values and p-values for the two-sided normal test in Section \ref{['sec:LR-e-variable']}. The curves represent the e-values or p-values as a function of the test statistic $Z=x$. The top four horizontal dotted lines correspond to $10^{\beta}$ with $\beta \in \{2, 1.5,1,0.5 \}$ for e-values. The bottom two horizontal dotted lines correspond to $\alpha \in \{0.05,0.01\}$ for p-values. (The reason why e-values are compared with levels $10^{\beta}$ with $\beta \in \{2, 1.5,1,0.5 \}$ is explained in Section \ref{['sec:c2-jeffrey']}.)
  • Figure 5.9: Expectation (black), lower bound (blue), and upper bound (red) of $\mathbb{E}\left[r^2(C(X^n)) / r^2(C^\mathrm{LRT}(X^n))\right]$. Data points correspond to values at $\alpha = \exp(-10^x)$ for $x$ from 8 to 0 in increments of $-0.5$.
  • Figure 5.9: Coverage regions of classical LRT (black), subsampling LRT (blue), cross-fit LRT (red), and split LRT (orange) at $\alpha = 0.1$. The six simulations use the same 1000 observations from $\mathrm{N}((0, 0), I_2)$.
  • Figure 5.9: Squared radius of multivariate normal split LRT with varying $p_0$. We simulate $X_1, \ldots, X_{1000} \buildrel \mathrm{d} \over \sim \mathrm{N}(0, I_d)$ and compute the split LRT region at $\alpha = 0.10$ and varying $p_0$. We repeat this simulation 1000 times. At each $p_0$, the circular point is the mean squared radius and the error bar represents the mean squared radius $\pm$ 1.96 standard deviations. Hence, the error bars represent a typical range of squared radius values for each $d$ and $p_0$. Blue points/lines correspond to $p_0^*$. The red curve is the expected squared radius. See Theorem \ref{['thm:split_p0']} proof in the supplement for a derivation of the expected squared radius at $p_0$.
  • ...and 17 more figures

Theorems & Definitions (15)

  • Remark 3.2
  • Remark 3.4
  • Remark 3.8
  • Remark 4.2
  • Remark 5.2: Mixture universal inference
  • Remark 8.2
  • Remark 9.16
  • Remark 10.4
  • Remark 10.21
  • Remark 12.3
  • ...and 5 more