Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

Nick W. Koning

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

Nick W. Koning

TL;DR

The paper tackles long-standing issues in hypothesis testing linked to pre-specified significance levels by introducing a framework for testing with data-dependent levels that preserves an unbiased size in expectation. It shows that post-hoc p-values are valid and tightly connected to e-values via the reciprocal relationship e = 1/p, offering a unified decision-theoretic view that clarifies how e-values extend p-values under stronger guarantees. By deriving Neyman–Pearson-type optimality results for post-hoc testing and embedding the construction within an abstract evidence framework, the work also provides robust tools for combining evidence, sequential testing, and generalized means. The theoretical development culminates in practical links to Markov’s inequality and Ville-type inequalities, and it sets the stage for broader notions of evidence beyond traditional $p$- and $e$-values, including certainty-equivalent and $h$-mean validity concepts.

Abstract

In traditional hypothesis testing one must pre-specify the significance level $α$ to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of $α$ would generally distort the size, possibly making it larger than the specified level $α$. We explore hypothesis testing with a data-dependent choice of $α$ by guaranteeing that there is no such size distortion in expectation, even if the level $α$ is arbitrarily selected based on the data. Unlike regular $p$-values, resulting `post-hoc $p$-values' allow us to `reject at level $p$' and still provide this guarantee. Interestingly, we find that $p$ is a post-hoc $p$-value if and only if $1/p$ is an $e$-value, a recently introduced measure of evidence. While often treated as different paradigms, this reveals $e$-values are simply $p$-values under a stronger error guarantee, thinly veiled by the reciprocal $p = 1/e$. Moreover, we extend classical optimal testing to optimal post-hoc testing. Finally, we apply our work to close Markov's inequality into a post-hoc $α$ equality, and we study more general forms of post-hoc testing that require us to generalize beyond $e$-values.

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

TL;DR

- and

-values, including certainty-equivalent and

-mean validity concepts.

Abstract

In traditional hypothesis testing one must pre-specify the significance level

to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of

would generally distort the size, possibly making it larger than the specified level

. We explore hypothesis testing with a data-dependent choice of

by guaranteeing that there is no such size distortion in expectation, even if the level

is arbitrarily selected based on the data. Unlike regular

-values, resulting `post-hoc

-values' allow us to `reject at level

' and still provide this guarantee. Interestingly, we find that

is a post-hoc

-value if and only if

is an

-value, a recently introduced measure of evidence. While often treated as different paradigms, this reveals

-values are simply

-values under a stronger error guarantee, thinly veiled by the reciprocal

. Moreover, we extend classical optimal testing to optimal post-hoc testing. Finally, we apply our work to close Markov's inequality into a post-hoc

equality, and we study more general forms of post-hoc testing that require us to generalize beyond

-values.

Paper Structure (48 sections, 34 theorems, 78 equations, 2 figures)

This paper contains 48 sections, 34 theorems, 78 equations, 2 figures.

Introduction
Contributions to the literature
Traditional hypothesis testing
The problem
Test functions
$p$-values
Relative size distortion
Testing with data-dependent $\alpha$
Generalizing size to data-dependent $\alpha$
Validity for a data-dependent level $\alpha$
Examples
Post-hoc $\alpha$ hypothesis testing
Post-hoc $p$-values: a simplification
Post-hoc $\alpha$ power
Generalizing traditional power?
...and 33 more sections

Key Result

Proposition 1

We have

Figures (2)

Figure 1: Realization of a test function and its $p$-value.
Figure 2: Illustration of (realized) test family (left) and its associated $p$-family (right). We can see the relationship between test families and $p$-families by swapping the horizontal and vertical axes.

Theorems & Definitions (86)

Remark 1
Remark 2
Example 1: $\alpha$-hacking
Definition 1: Validity for data-dependent level $\widetilde{\alpha}$
Proposition 1
proof
Example 2: A conservative data-dependent level $\widetilde{\alpha}$
Example 3: Discontinuity of maximum size distortion
Example 4: Size distortion when $\alpha$-hacking
Example 5: Rejecting at level $p$
...and 76 more

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

TL;DR

Abstract

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (86)