Table of Contents
Fetching ...

Continuous Testing: Unifying Tests and E-values

Nick W. Koning

TL;DR

This work develops a unified continuous-testing framework that recasts hypothesis testing in terms of continuous evidence on the interval $[0,1]$ and its level-scaled counterpart $[0,1/\alpha]$, thereby unifying classical tests with $e$-values. By relating continuous tests to randomized tests, rescaling by level, and adopting generalized-mean power targets (including Neyman-Pearson and log-optimal e-values) as special cases, the paper derives existence, optimality, and duality results for both simple and composite hypotheses. It provides concrete constructions in Gaussian location models, establishes a robust link between $p$-values and continuous tests (including post-hoc validity), and shows how level-$\alpha$ continuous tests subsume e-values as level-0 tests while enabling sequential testing via $e$-processes. The framework offers stronger, more interpretable guarantees for evidence and yields practical guidance for reporting and combining tests across levels. Overall, continuous testing fills a foundational role for evidence-based inference, bridging traditional hypothesis testing with modern e-value methodologies.

Abstract

The e-value is swiftly rising in prominence in many applications of hypothesis testing and multiple testing, yet its relationship to classical testing theory remains elusive. We unify e-values and classical testing into a single 'continuous testing' framework: we argue that e-values are simply the continuous generalization of a test. This cements their foundational role in hypothesis testing. Such continuous tests relate to the rejection probability of classical randomized tests, offering the benefits of randomized tests without the downsides of a randomized decision. By generalizing the traditional notion of power, we obtain a unified theory of optimal continuous testing that nests both classical Neyman-Pearson-optimal tests and log-optimal e-values as special cases. This implies the only difference between typical classical tests and typical e-values is a different choice of power target. We visually illustrate this in a Gaussian location model, where such tests are easy to express. Finally, we describe the relationship to the traditional p-value, and show that continuous tests offer a stronger and arguably more appropriate guarantee than p-values when used as a continuous measure of evidence.

Continuous Testing: Unifying Tests and E-values

TL;DR

This work develops a unified continuous-testing framework that recasts hypothesis testing in terms of continuous evidence on the interval and its level-scaled counterpart , thereby unifying classical tests with -values. By relating continuous tests to randomized tests, rescaling by level, and adopting generalized-mean power targets (including Neyman-Pearson and log-optimal e-values) as special cases, the paper derives existence, optimality, and duality results for both simple and composite hypotheses. It provides concrete constructions in Gaussian location models, establishes a robust link between -values and continuous tests (including post-hoc validity), and shows how level- continuous tests subsume e-values as level-0 tests while enabling sequential testing via -processes. The framework offers stronger, more interpretable guarantees for evidence and yields practical guidance for reporting and combining tests across levels. Overall, continuous testing fills a foundational role for evidence-based inference, bridging traditional hypothesis testing with modern e-value methodologies.

Abstract

The e-value is swiftly rising in prominence in many applications of hypothesis testing and multiple testing, yet its relationship to classical testing theory remains elusive. We unify e-values and classical testing into a single 'continuous testing' framework: we argue that e-values are simply the continuous generalization of a test. This cements their foundational role in hypothesis testing. Such continuous tests relate to the rejection probability of classical randomized tests, offering the benefits of randomized tests without the downsides of a randomized decision. By generalizing the traditional notion of power, we obtain a unified theory of optimal continuous testing that nests both classical Neyman-Pearson-optimal tests and log-optimal e-values as special cases. This implies the only difference between typical classical tests and typical e-values is a different choice of power target. We visually illustrate this in a Gaussian location model, where such tests are easy to express. Finally, we describe the relationship to the traditional p-value, and show that continuous tests offer a stronger and arguably more appropriate guarantee than p-values when used as a continuous measure of evidence.
Paper Structure (53 sections, 13 theorems, 106 equations, 3 figures)

This paper contains 53 sections, 13 theorems, 106 equations, 3 figures.

Key Result

Proposition 1

If $e$ is a valid $e$-value for $H$, then

Figures (3)

  • Figure 1: Optimal Gaussian $h$-generalized mean level $\alpha = 0$ continuous test $d\mathcal{N}(\mu / (1-h), \sigma^2) / d\mathcal{N}(0, \sigma^2) (X)$ plotted over $X \in [0, 10]$ for $\mu = 1$, $\sigma = 1$ and various values of $h$. For $h = 0$ this equals the likelihood ratio between distributions with means 0 and $\mu$. For larger $h$, the continuous tests steepen, and for smaller $h$ the continuous tests flatten out. The $h = 1$ case is not plotted, as this effectively becomes a vertical line at $\infty$ as $h \to 1$.
  • Figure 2: Optimal Gaussian $\alpha = 0.05$, $h$-generalized mean $b_\alpha d\mathcal{N}(\mu / (1-h), \sigma^2) / d\mathcal{N}(0, \sigma^2) (X) \wedge 1/\alpha$ plotted over $X \in [0, 10]$ for $\mu = 1$, $\sigma = 1$ and various values of $h$. Compared to Figure \ref{['fig:uncapped']}, the values are capped and inflated here. For small $h$ this inflation is negligible, but for large values of $h$ the capping has a substantial impact. The $h = 1$ case (the Neyman-Pearson-optimal one-sided $Z$ test) is also pictured here, which equals $1/0.05 = 20$ if $X$ exceeds the $1-\alpha$ quantile of $\mathcal{N}(0, \sigma^2)$$(\approx 1.64)$. The $h = 0.9$ case is close to the $h = 1$ case, but slightly smoothed out.
  • Figure 3: The $p$-value of a collection of tests $\{\tau_\alpha\}_{\alpha > 0}$.

Theorems & Definitions (46)

  • Remark 1
  • Remark 2
  • Remark 3: Cross-level interpretation
  • Remark 4: Betting and rescaling
  • Remark 5: Level 0 test on original scale
  • Remark 6: Betting and level 0
  • Remark 7: Unbounded measure of evidence
  • Remark 8
  • Proposition 1
  • proof
  • ...and 36 more