Combining exchangeable p-values
Matteo Gasparin, Ruodu Wang, Aaditya Ramdas
TL;DR
This work tackles the merge of multiple p-values under exchangeability, showing that substantial power improvements are possible beyond classical rules by exploiting exchangeable structure or external randomization. Central to the approach is a dual calibrator–e-value representation that ties p-merging rules to e-values and Markov-type inequalities, enabling both exchangeable and randomized enhancements of rules such as Rüger, Hommel, and the arithmetic, harmonic, and geometric means. The authors derive new ex-p and randomized p-merging functions, prove dominance over existing symmetric procedures, and provide sequential, asymptotic, and simulation results to illustrate practical gains and implementation considerations. The findings offer actionable guidance for global null testing and multiple testing scenarios where p-values arise in exchangeable streams or under unknown dependency, with applications to sample-splitting and online updating. Overall, the paper contributes a systematic, theory-backed framework for stronger, exchangeability-aware p-value merging with broad applicability in statistics and related fields.
Abstract
The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well known rules like ``twice the median'' and ``twice the average'', as well as geometric and harmonic means. Exchangeable p-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined p-values stabilize. Our work also improves rules for combining arbitrarily dependent p-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the p-values to e-values (using an $α$-dependent calibrator), averaging those e-values, converting to a level-$α$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
