Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value
Nick W. Koning
TL;DR
The paper tackles long-standing issues in hypothesis testing linked to pre-specified significance levels by introducing a framework for testing with data-dependent levels that preserves an unbiased size in expectation. It shows that post-hoc p-values are valid and tightly connected to e-values via the reciprocal relationship e = 1/p, offering a unified decision-theoretic view that clarifies how e-values extend p-values under stronger guarantees. By deriving Neyman–Pearson-type optimality results for post-hoc testing and embedding the construction within an abstract evidence framework, the work also provides robust tools for combining evidence, sequential testing, and generalized means. The theoretical development culminates in practical links to Markov’s inequality and Ville-type inequalities, and it sets the stage for broader notions of evidence beyond traditional $p$- and $e$-values, including certainty-equivalent and $h$-mean validity concepts.
Abstract
In traditional hypothesis testing one must pre-specify the significance level $α$ to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of $α$ would generally distort the size, possibly making it larger than the specified level $α$. We explore hypothesis testing with a data-dependent choice of $α$ by guaranteeing that there is no such size distortion in expectation, even if the level $α$ is arbitrarily selected based on the data. Unlike regular $p$-values, resulting `post-hoc $p$-values' allow us to `reject at level $p$' and still provide this guarantee. Interestingly, we find that $p$ is a post-hoc $p$-value if and only if $1/p$ is an $e$-value, a recently introduced measure of evidence. While often treated as different paradigms, this reveals $e$-values are simply $p$-values under a stronger error guarantee, thinly veiled by the reciprocal $p = 1/e$. Moreover, we extend classical optimal testing to optimal post-hoc testing. Finally, we apply our work to close Markov's inequality into a post-hoc $α$ equality, and we study more general forms of post-hoc testing that require us to generalize beyond $e$-values.
