Table of Contents
Fetching ...

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Lin Gui, Yuchao Jiang, Jingshu Wang

TL;DR

This work analyzes heavy-tailed p-value combination tests (notably the Cauchy and harmonic mean p-values) for aggregating dependent signals under a fixed number of base tests as the global significance level $oldsymbol{\alpha}$ tends to zero. It develops a unified theory for both one-sided and two-sided p-values within the regularly varying tail framework, proving asymptotic validity under pairwise quasi-asymptotic independence and, when correlations are not perfectly aligned, asymptotic equivalence to Bonferroni for two-sided p-values. Empirical results show that under asymptotic independence, these tests behave like Bonferroni at very small $oldsymbol{\alpha}$, whereas under asymptotic dependence (e.g., multivariate $t$) they can offer substantial power gains, especially for dense signals and heavier tails (tail index $oldsymbol{\gamma}  1$). Real-data applications in circadian rhythm detection and GWAS demonstrate improved power and computational efficiency over Bonferroni, with practical recommendations such as left-truncated $t_1$ distributions to mitigate issues with negatively supported transformed statistics. Overall, the paper highlights when heavy-tailed combination tests add value and how to implement them robustly in dependent settings.

Abstract

Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate t-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over Bonferroni, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where p-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

TL;DR

This work analyzes heavy-tailed p-value combination tests (notably the Cauchy and harmonic mean p-values) for aggregating dependent signals under a fixed number of base tests as the global significance level tends to zero. It develops a unified theory for both one-sided and two-sided p-values within the regularly varying tail framework, proving asymptotic validity under pairwise quasi-asymptotic independence and, when correlations are not perfectly aligned, asymptotic equivalence to Bonferroni for two-sided p-values. Empirical results show that under asymptotic independence, these tests behave like Bonferroni at very small , whereas under asymptotic dependence (e.g., multivariate ) they can offer substantial power gains, especially for dense signals and heavier tails (tail index ). Real-data applications in circadian rhythm detection and GWAS demonstrate improved power and computational efficiency over Bonferroni, with practical recommendations such as left-truncated distributions to mitigate issues with negatively supported transformed statistics. Overall, the paper highlights when heavy-tailed combination tests add value and how to implement them robustly in dependent settings.

Abstract

Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate t-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over Bonferroni, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where p-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.
Paper Structure (34 sections, 13 theorems, 146 equations, 13 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 13 theorems, 146 equations, 13 figures, 6 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $X_{1}, \ldots, X_{n}$ be $n$ pairwise quasi-asymptotically independent real-valued random variables with distributions $F_{1}, \ldots, F_{n} \in \mathscr{C}$, respectively. Denote $S_n=\sum_{i=1}^nX_i$. Then, it holds that

Figures (13)

  • Figure 1: Rejection regions for Bonferroni (black), Fisher (blue), Cauchy (red), and Fréchet $\gamma=1$ (green) combination tests for a two-sided test in test statistics space when the number of base hypotheses $n=2$ and at different significance level $\alpha$. The boundaries of the rejection regions are shown with different colored lines, and the rejection regions are the areas outside of these boundaries that do not include the origin.
  • Figure 2: The type-I error of the combination test when $n = 5$ with different distributions: Cauchy (star point), inverse Gamma (blue), Fréchet (green), Pareto (purple), student t (red), left-truncated t with truncation threshold $p_0=0.9$ (dark orange), left-truncated t with truncation threshold $p_0=0.7$ (orange), left-truncated t with truncation treshold $p_0=0.5$ (light orange). The vertical axis represents the empirical type-I error, and the horizontal axis stands for the tail index $\gamma$.
  • Figure 3: Power comparison with the Bonferroni test of the combination test with different distributions: Levy (turquoise with diamond dot), Cauchy (red with round dot), Fréchet $\gamma=1$ (green with square dot), Pareto $\gamma=1$ (purple with triangular dot), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange with inverted-triangle dot). Left plots correspond to dense signals and right ones correspond to sparse signals.
  • Figure 4: The difference between the combination test and Bonferroni test diminishes as the significance level converges to 0. The left plot simulates the ratio in \ref{['thm:same_as_bonferroni']} with fixed $\rho_{ij}=0.5$ under the global null. Right plots simulate the same ratio under global alternative with dense and sparse signals. The combination tests are with different distributions: Levy (turquoise), Cauchy (red), Fréchet $\gamma=1$ (green), Pareto $\gamma=1$ (purple), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange). The number of repeated simulations is $10^8$.
  • Figure 5: Power comparison with the Bonferroni test when the asymptotic independence is violated of the combination test with different distributions: Cauchy (red with round dot), Fréchet $\gamma=1$ (green with square dot), Pareto $\gamma=1$ (purple with triangular dot), left-truncated $t_1$ with truncation threshold $p_0=0.9$ (dark orange with inverted-triangle dot). Left plots correspond to dense signals, and right plots correspond to sparse signals. The maximum power gain is defined as the maximum of the empirical power difference between the proposed test and the Bonferroni test over all possible values of $\mu$.
  • ...and 8 more figures

Theorems & Definitions (36)

  • Definition 2.1: Quasi-asymptotic independence
  • Definition 2.2: Consistently-varying class $\mathscr{C}$
  • Theorem 2.1: Theorem 3.1 of chen2009sums
  • Corollary 2.1
  • Remark 2.1
  • Definition 2.3: Regularly varying tailed class $\mathscr{R}_{-\gamma}$
  • Definition 2.4: Combination test
  • Definition 2.5: Average-based combination test
  • Definition 2.6: Weighted combination test
  • Remark 2.2
  • ...and 26 more