Table of Contents
Fetching ...

Combining exchangeable p-values

Matteo Gasparin, Ruodu Wang, Aaditya Ramdas

TL;DR

This work tackles the merge of multiple p-values under exchangeability, showing that substantial power improvements are possible beyond classical rules by exploiting exchangeable structure or external randomization. Central to the approach is a dual calibrator–e-value representation that ties p-merging rules to e-values and Markov-type inequalities, enabling both exchangeable and randomized enhancements of rules such as Rüger, Hommel, and the arithmetic, harmonic, and geometric means. The authors derive new ex-p and randomized p-merging functions, prove dominance over existing symmetric procedures, and provide sequential, asymptotic, and simulation results to illustrate practical gains and implementation considerations. The findings offer actionable guidance for global null testing and multiple testing scenarios where p-values arise in exchangeable streams or under unknown dependency, with applications to sample-splitting and online updating. Overall, the paper contributes a systematic, theory-backed framework for stronger, exchangeability-aware p-value merging with broad applicability in statistics and related fields.

Abstract

The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well known rules like ``twice the median'' and ``twice the average'', as well as geometric and harmonic means. Exchangeable p-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined p-values stabilize. Our work also improves rules for combining arbitrarily dependent p-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the p-values to e-values (using an $α$-dependent calibrator), averaging those e-values, converting to a level-$α$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.

Combining exchangeable p-values

TL;DR

This work tackles the merge of multiple p-values under exchangeability, showing that substantial power improvements are possible beyond classical rules by exploiting exchangeable structure or external randomization. Central to the approach is a dual calibrator–e-value representation that ties p-merging rules to e-values and Markov-type inequalities, enabling both exchangeable and randomized enhancements of rules such as Rüger, Hommel, and the arithmetic, harmonic, and geometric means. The authors derive new ex-p and randomized p-merging functions, prove dominance over existing symmetric procedures, and provide sequential, asymptotic, and simulation results to illustrate practical gains and implementation considerations. The findings offer actionable guidance for global null testing and multiple testing scenarios where p-values arise in exchangeable streams or under unknown dependency, with applications to sample-splitting and online updating. Overall, the paper contributes a systematic, theory-backed framework for stronger, exchangeability-aware p-value merging with broad applicability in statistics and related fields.

Abstract

The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well known rules like ``twice the median'' and ``twice the average'', as well as geometric and harmonic means. Exchangeable p-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined p-values stabilize. Our work also improves rules for combining arbitrarily dependent p-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the p-values to e-values (using an -dependent calibrator), averaging those e-values, converting to a level- test using Markov's inequality, and finally obtaining p-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
Paper Structure (42 sections, 36 theorems, 143 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 42 sections, 36 theorems, 143 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.2

For any admissible homogeneous p-merging function $F$, there exist $(\lambda_1,\dots,\lambda_K)\in\Delta_K$ and admissible calibrators $f_1,\dots,f_K$ such that Conversely, for any $(\lambda_1,\dots,\lambda_K)\in\Delta_K$ and calibrators $f_1,\dots,f_K$, eq:calibrator determines a homogeneous p-merging function.

Figures (6)

  • Figure 1: Combination of p-values using different ex-p-merging functions under high (left) and low (right) dependence. The performance of the different ex-p-merging functions is almost reversed in the two situations.
  • Figure 2: Combination of p-values using different ex-p-merging functions and different ordering based on the sample size. Non ex-p-merging functions valid under arbitrary dependence are added for comparison. The ex-p-merging rules are more powerful if p-values are ordered in decreasing order with respect to the sample size.
  • Figure 3: Combination of p-values using different rules. Every subplot illustrates power against $\mu$. The left endpoint of $\mu = 0$ actually represents the empirical type I error, which is controlled at the nominal level $\alpha = 0.05$ for all methods proposed. The first column has $\rho=0.9$, while the second column has $\rho=0.1$ --- as expected, the Bonferroni correction is more powerful near independence, but is less powerful under strong dependence. Further, our exchangeable and randomized improvements offer sizeable increases in power over the original variants in all settings.
  • Figure 4: Combination of p-values using different randomized p-merging functions. The order of the performance of the different ex-p-merging functions is almost the opposite in the two situations.
  • Figure 5: Combination of p-values using different randomized combination rules based on the average of p-values (and the Bonferroni method, for comparison). $F^*_{\mathrm{UA}}$ is more powerful than $F'_{\mathrm{UA}}$ only when $\mu \lesssim 2.5$.
  • ...and 1 more figures

Theorems & Definitions (77)

  • Definition 2.1
  • Theorem 2.2: vovk2022; Theorem 5.1
  • Lemma 2.3
  • Theorem 2.4: Exchangeable Markov Inequality
  • Theorem 2.5: Uniformly-randomized Markov Inequality
  • Theorem 2.6: Exchangeable and Uniformly-randomized Markov Inequality
  • Remark 3.1
  • Theorem 3.2
  • Definition 3.3
  • Theorem 3.4
  • ...and 67 more