Continuous Testing: Unifying Tests and E-values
Nick W. Koning
TL;DR
This work develops a unified continuous-testing framework that recasts hypothesis testing in terms of continuous evidence on the interval $[0,1]$ and its level-scaled counterpart $[0,1/\alpha]$, thereby unifying classical tests with $e$-values. By relating continuous tests to randomized tests, rescaling by level, and adopting generalized-mean power targets (including Neyman-Pearson and log-optimal e-values) as special cases, the paper derives existence, optimality, and duality results for both simple and composite hypotheses. It provides concrete constructions in Gaussian location models, establishes a robust link between $p$-values and continuous tests (including post-hoc validity), and shows how level-$\alpha$ continuous tests subsume e-values as level-0 tests while enabling sequential testing via $e$-processes. The framework offers stronger, more interpretable guarantees for evidence and yields practical guidance for reporting and combining tests across levels. Overall, continuous testing fills a foundational role for evidence-based inference, bridging traditional hypothesis testing with modern e-value methodologies.
Abstract
The e-value is swiftly rising in prominence in many applications of hypothesis testing and multiple testing, yet its relationship to classical testing theory remains elusive. We unify e-values and classical testing into a single 'continuous testing' framework: we argue that e-values are simply the continuous generalization of a test. This cements their foundational role in hypothesis testing. Such continuous tests relate to the rejection probability of classical randomized tests, offering the benefits of randomized tests without the downsides of a randomized decision. By generalizing the traditional notion of power, we obtain a unified theory of optimal continuous testing that nests both classical Neyman-Pearson-optimal tests and log-optimal e-values as special cases. This implies the only difference between typical classical tests and typical e-values is a different choice of power target. We visually illustrate this in a Gaussian location model, where such tests are easy to express. Finally, we describe the relationship to the traditional p-value, and show that continuous tests offer a stronger and arguably more appropriate guarantee than p-values when used as a continuous measure of evidence.
