Table of Contents
Fetching ...

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

Nick W. Koning

TL;DR

The paper tackles long-standing issues in hypothesis testing linked to pre-specified significance levels by introducing a framework for testing with data-dependent levels that preserves an unbiased size in expectation. It shows that post-hoc p-values are valid and tightly connected to e-values via the reciprocal relationship e = 1/p, offering a unified decision-theoretic view that clarifies how e-values extend p-values under stronger guarantees. By deriving Neyman–Pearson-type optimality results for post-hoc testing and embedding the construction within an abstract evidence framework, the work also provides robust tools for combining evidence, sequential testing, and generalized means. The theoretical development culminates in practical links to Markov’s inequality and Ville-type inequalities, and it sets the stage for broader notions of evidence beyond traditional $p$- and $e$-values, including certainty-equivalent and $h$-mean validity concepts.

Abstract

In traditional hypothesis testing one must pre-specify the significance level $α$ to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of $α$ would generally distort the size, possibly making it larger than the specified level $α$. We explore hypothesis testing with a data-dependent choice of $α$ by guaranteeing that there is no such size distortion in expectation, even if the level $α$ is arbitrarily selected based on the data. Unlike regular $p$-values, resulting `post-hoc $p$-values' allow us to `reject at level $p$' and still provide this guarantee. Interestingly, we find that $p$ is a post-hoc $p$-value if and only if $1/p$ is an $e$-value, a recently introduced measure of evidence. While often treated as different paradigms, this reveals $e$-values are simply $p$-values under a stronger error guarantee, thinly veiled by the reciprocal $p = 1/e$. Moreover, we extend classical optimal testing to optimal post-hoc testing. Finally, we apply our work to close Markov's inequality into a post-hoc $α$ equality, and we study more general forms of post-hoc testing that require us to generalize beyond $e$-values.

Post-hoc $α$ Hypothesis Testing and the Post-hoc $p$-value

TL;DR

The paper tackles long-standing issues in hypothesis testing linked to pre-specified significance levels by introducing a framework for testing with data-dependent levels that preserves an unbiased size in expectation. It shows that post-hoc p-values are valid and tightly connected to e-values via the reciprocal relationship e = 1/p, offering a unified decision-theoretic view that clarifies how e-values extend p-values under stronger guarantees. By deriving Neyman–Pearson-type optimality results for post-hoc testing and embedding the construction within an abstract evidence framework, the work also provides robust tools for combining evidence, sequential testing, and generalized means. The theoretical development culminates in practical links to Markov’s inequality and Ville-type inequalities, and it sets the stage for broader notions of evidence beyond traditional - and -values, including certainty-equivalent and -mean validity concepts.

Abstract

In traditional hypothesis testing one must pre-specify the significance level to bound the `size' of the test: its probability to falsely reject the hypothesis. Indeed, a data-dependent selection of would generally distort the size, possibly making it larger than the specified level . We explore hypothesis testing with a data-dependent choice of by guaranteeing that there is no such size distortion in expectation, even if the level is arbitrarily selected based on the data. Unlike regular -values, resulting `post-hoc -values' allow us to `reject at level ' and still provide this guarantee. Interestingly, we find that is a post-hoc -value if and only if is an -value, a recently introduced measure of evidence. While often treated as different paradigms, this reveals -values are simply -values under a stronger error guarantee, thinly veiled by the reciprocal . Moreover, we extend classical optimal testing to optimal post-hoc testing. Finally, we apply our work to close Markov's inequality into a post-hoc equality, and we study more general forms of post-hoc testing that require us to generalize beyond -values.
Paper Structure (48 sections, 34 theorems, 78 equations, 2 figures)

This paper contains 48 sections, 34 theorems, 78 equations, 2 figures.

Key Result

Proposition 1

We have

Figures (2)

  • Figure 1: Realization of a test function and its $p$-value.
  • Figure 2: Illustration of (realized) test family (left) and its associated $p$-family (right). We can see the relationship between test families and $p$-families by swapping the horizontal and vertical axes.

Theorems & Definitions (86)

  • Remark 1
  • Remark 2
  • Example 1: $\alpha$-hacking
  • Definition 1: Validity for data-dependent level $\widetilde{\alpha}$
  • Proposition 1
  • proof
  • Example 2: A conservative data-dependent level $\widetilde{\alpha}$
  • Example 3: Discontinuity of maximum size distortion
  • Example 4: Size distortion when $\alpha$-hacking
  • Example 5: Rejecting at level $p$
  • ...and 76 more