Sub-uniformity of harmonic mean p-values
Yuyu Chen, Ruodu Wang, Yuming Wang, Wenhao Zhu
TL;DR
The paper investigates when generalized means of dependent p-values, especially the harmonic mean, yield sub-uniform (anti-conservative) results across all significance levels. By linking p-value merging to risk aggregation, it analyzes stochastic orderings $M^{\mathbf w}_r(U_1,\dots,U_n)\preceq_{st} U_1$ under dependence models such as NUOD, extremal mixture copulas, and Clayton copulas, proving sub-uniformity for $r\le -1$ in several cases. It provides explicit threshold adjustments under independence (growth of order $\log n$) and shows discrete uniform cases approximate the continuous behavior as discretization becomes fine. The guidance is to prefer Simes or the Cauchy method for robust validity under dependence, with limited but useful corrections for certain positive-dependence structures like Clayton copulas. Overall, the work clarifies when harmonic-mean-based merging is valid and highlights a persistent need for dependence-aware adjustments in multiple testing practice.
Abstract
We obtain several inequalities on the generalized means of dependent p-values. In particular, the weighted harmonic mean of p-values is strictly sub-uniform under several dependence assumptions of p-values, including independence, negative upper orthant dependence, the class of extremal mixture copulas, and some Clayton copulas. Sub-uniformity of the harmonic mean of p-values has an important implication in multiple hypothesis testing: It is statistically invalid (anti-conservative) to merge p-values using the harmonic mean unless a proper threshold or multiplier adjustment is used, and this applies across all significance levels. The required multiplier adjustment on the harmonic mean p-value grows sub-linearly to infinity as the number of p-values increases, and hence there does not exist a constant multiplier that works for any number of p-values, even under independence.
