Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Lin Gui; Yuchao Jiang; Jingshu Wang

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Lin Gui, Yuchao Jiang, Jingshu Wang

TL;DR

This work analyzes heavy-tailed p-value combination tests (notably the Cauchy and harmonic mean p-values) for aggregating dependent signals under a fixed number of base tests as the global significance level $oldsymbol{\alpha}$ tends to zero. It develops a unified theory for both one-sided and two-sided p-values within the regularly varying tail framework, proving asymptotic validity under pairwise quasi-asymptotic independence and, when correlations are not perfectly aligned, asymptotic equivalence to Bonferroni for two-sided p-values. Empirical results show that under asymptotic independence, these tests behave like Bonferroni at very small $oldsymbol{\alpha}$, whereas under asymptotic dependence (e.g., multivariate $t$) they can offer substantial power gains, especially for dense signals and heavier tails (tail index $oldsymbol{\gamma} 1$). Real-data applications in circadian rhythm detection and GWAS demonstrate improved power and computational efficiency over Bonferroni, with practical recommendations such as left-truncated $t_1$ distributions to mitigate issues with negatively supported transformed statistics. Overall, the paper highlights when heavy-tailed combination tests add value and how to implement them robustly in dependent settings.

Abstract

Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate t-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over Bonferroni, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where p-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

TL;DR

tends to zero. It develops a unified theory for both one-sided and two-sided p-values within the regularly varying tail framework, proving asymptotic validity under pairwise quasi-asymptotic independence and, when correlations are not perfectly aligned, asymptotic equivalence to Bonferroni for two-sided p-values. Empirical results show that under asymptotic independence, these tests behave like Bonferroni at very small

, whereas under asymptotic dependence (e.g., multivariate

) they can offer substantial power gains, especially for dense signals and heavier tails (tail index

). Real-data applications in circadian rhythm detection and GWAS demonstrate improved power and computational efficiency over Bonferroni, with practical recommendations such as left-truncated

distributions to mitigate issues with negatively supported transformed statistics. Overall, the paper highlights when heavy-tailed combination tests add value and how to implement them robustly in dependent settings.

Abstract

Paper Structure (34 sections, 13 theorems, 146 equations, 13 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 13 theorems, 146 equations, 13 figures, 6 tables, 1 algorithm.

Introduction
Model setup and theoretical results
Model setup
Tail properties of the sum $S_n$
Asymptotic validity of the heavy-tailed combination tests
Asymptotic equivalence to the Bonferroni test
Empirical evaluations of the heavy-tailed combination tests under asymptotic independence
Empirical validity of the combination tests
Empirical comparison with the Bonferroni test
The combination test under asymptotic dependence
Real Data Examples
Circadian rhythm detection
SNP-based gene level association testing in GWAS
Discussion
Type-I error of the combination test with negatively correlated p-values
...and 19 more sections

Key Result

Theorem 2.1

Let $X_{1}, \ldots, X_{n}$ be $n$ pairwise quasi-asymptotically independent real-valued random variables with distributions $F_{1}, \ldots, F_{n} \in \mathscr{C}$, respectively. Denote $S_n=\sum_{i=1}^nX_i$. Then, it holds that

Figures (13)

Figure 1: Rejection regions for Bonferroni (black), Fisher (blue), Cauchy (red), and Fréchet $\gamma=1$ (green) combination tests for a two-sided test in test statistics space when the number of base hypotheses $n=2$ and at different significance level $\alpha$. The boundaries of the rejection regions are shown with different colored lines, and the rejection regions are the areas outside of these boundaries that do not include the origin.
Figure 2: The type-I error of the combination test when $n = 5$ with different distributions: Cauchy (star point), inverse Gamma (blue), Fréchet (green), Pareto (purple), student t (red), left-truncated t with truncation threshold $p_0=0.9$ (dark orange), left-truncated t with truncation threshold $p_0=0.7$ (orange), left-truncated t with truncation treshold $p_0=0.5$ (light orange). The vertical axis represents the empirical type-I error, and the horizontal axis stands for the tail index $\gamma$.
Figure 3: Power comparison with the Bonferroni test of the combination test with different distributions: Levy (turquoise with diamond dot), Cauchy (red with round dot), Fréchet $\gamma=1$ (green with square dot), Pareto $\gamma=1$ (purple with triangular dot), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange with inverted-triangle dot). Left plots correspond to dense signals and right ones correspond to sparse signals.
Figure 4: The difference between the combination test and Bonferroni test diminishes as the significance level converges to 0. The left plot simulates the ratio in \ref{['thm:same_as_bonferroni']} with fixed $\rho_{ij}=0.5$ under the global null. Right plots simulate the same ratio under global alternative with dense and sparse signals. The combination tests are with different distributions: Levy (turquoise), Cauchy (red), Fréchet $\gamma=1$ (green), Pareto $\gamma=1$ (purple), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange). The number of repeated simulations is $10^8$.
Figure 5: Power comparison with the Bonferroni test when the asymptotic independence is violated of the combination test with different distributions: Cauchy (red with round dot), Fréchet $\gamma=1$ (green with square dot), Pareto $\gamma=1$ (purple with triangular dot), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange with inverted-triangle dot). Left plots correspond to dense signals, and right plots correspond to sparse signals. The maximum power gain is defined as the maximum of the empirical power difference between the proposed test and the Bonferroni test over all possible values of $\mu$.
...and 8 more figures

Theorems & Definitions (36)

Definition 2.1: Quasi-asymptotic independence
Definition 2.2: Consistently-varying class $\mathscr{C}$
Theorem 2.1: Theorem 3.1 of chen2009sums
Corollary 2.1
Remark 2.1
Definition 2.3: Regularly varying tailed class $\mathscr{R}_{-\gamma}$
Definition 2.4: Combination test
Definition 2.5: Average-based combination test
Definition 2.6: Weighted combination test
Remark 2.2
...and 26 more

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

TL;DR

Abstract

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (36)