Table of Contents
Fetching ...

False Discovery Rate Control via Frequentist-assisted Horseshoe

Qiaoyu Liang, Zihan Zhu, Ziang Fu, Michael Evans

TL;DR

This work tackles the challenge of achieving finite-sample false discovery rate (FDR) control when using the horseshoe global-local shrinkage prior in high-dimensional normal means testing. It introduces frequentist-assisted horseshoe (FAHS), with two variants, $m$-FAHS and $e$-FAHS, which estimate the global shrinkage parameter $\xi$ via minimax considerations and integrate this into the horseshoe framework while preserving Bayesian uncertainty quantification. Empirical results across independent and correlated tests, plus a real prostate cancer dataset, show that FAHS delivers robust finite-sample FDR control and often superior stability compared to BH, q-value, two-groups empirical Bayes, and vanilla horseshoe. The study also establishes theoretical connections between minimax estimation and FDR control and outlines potential generalizations to other global-local priors and generalized linear models, with future work including e-values and prior–data conflict detection for enhanced robustness.

Abstract

The horseshoe prior, a widely used handy alternative to the spike-and-slab prior, has proven to be an exceptional default global-local shrinkage prior in Bayesian inference and machine learning. However, designing tests with frequentist false discovery rate (FDR) control using the horseshoe prior or the general class of global-local shrinkage priors remains an open problem. In this paper, we propose a frequentist-assisted horseshoe procedure that not only resolves this long-standing FDR control issue for the high dimensional normal means testing problem but also exhibits satisfactory finite-sample FDR control under any desired nominal level for both large-scale multiple independent and correlated tests. We carry out the frequentist-assisted horseshoe procedure in an easy and intuitive way by using the minimax estimator of the global parameter of the horseshoe prior while maintaining the remaining full Bayes vanilla horseshoe structure. The results of both intensive simulations under different sparsity levels, and real-world data demonstrate that the frequentist-assisted horseshoe procedure consistently achieves robust finite-sample FDR control. Existing frequentist or Bayesian FDR control procedures can lose finite-sample FDR control in a variety of common sparse cases. Based on the intimate relationship between the minimax estimation and the level of FDR control discovered in this work, we point out potential generalizations to achieve FDR control for both more complicated models and the general global-local shrinkage prior family.

False Discovery Rate Control via Frequentist-assisted Horseshoe

TL;DR

This work tackles the challenge of achieving finite-sample false discovery rate (FDR) control when using the horseshoe global-local shrinkage prior in high-dimensional normal means testing. It introduces frequentist-assisted horseshoe (FAHS), with two variants, -FAHS and -FAHS, which estimate the global shrinkage parameter via minimax considerations and integrate this into the horseshoe framework while preserving Bayesian uncertainty quantification. Empirical results across independent and correlated tests, plus a real prostate cancer dataset, show that FAHS delivers robust finite-sample FDR control and often superior stability compared to BH, q-value, two-groups empirical Bayes, and vanilla horseshoe. The study also establishes theoretical connections between minimax estimation and FDR control and outlines potential generalizations to other global-local priors and generalized linear models, with future work including e-values and prior–data conflict detection for enhanced robustness.

Abstract

The horseshoe prior, a widely used handy alternative to the spike-and-slab prior, has proven to be an exceptional default global-local shrinkage prior in Bayesian inference and machine learning. However, designing tests with frequentist false discovery rate (FDR) control using the horseshoe prior or the general class of global-local shrinkage priors remains an open problem. In this paper, we propose a frequentist-assisted horseshoe procedure that not only resolves this long-standing FDR control issue for the high dimensional normal means testing problem but also exhibits satisfactory finite-sample FDR control under any desired nominal level for both large-scale multiple independent and correlated tests. We carry out the frequentist-assisted horseshoe procedure in an easy and intuitive way by using the minimax estimator of the global parameter of the horseshoe prior while maintaining the remaining full Bayes vanilla horseshoe structure. The results of both intensive simulations under different sparsity levels, and real-world data demonstrate that the frequentist-assisted horseshoe procedure consistently achieves robust finite-sample FDR control. Existing frequentist or Bayesian FDR control procedures can lose finite-sample FDR control in a variety of common sparse cases. Based on the intimate relationship between the minimax estimation and the level of FDR control discovered in this work, we point out potential generalizations to achieve FDR control for both more complicated models and the general global-local shrinkage prior family.

Paper Structure

This paper contains 32 sections, 1 theorem, 23 equations, 13 figures, 3 tables, 4 algorithms.

Key Result

Theorem 1

(Theorem 2.1. song2020BSSM) Given a positive constant $\omega$ and some $\alpha > 1$, if $\xi^{\alpha-1} \geq(m_1 / m)^c\{\log (m / m_1)\}^{1 / 2}$ for some $c \in(0,1+\omega / 2)$, and $\xi^{\alpha-1} \prec\{(m_1 / m) \log (m / m_1)\}^\alpha$, then where $C_1(\omega)=\sqrt{2+\omega}+\sqrt{\omega}$ and it satisfies $\lim _{\omega \downarrow 0} C_1(\omega)=\sqrt{2}$. If furthermore, $\xi^{\alpha-1

Figures (13)

  • Figure 1: Illustration of tests using the empirical Bayes vanilla horseshoe (EBHS) and the full Bayes vanilla horseshoe (FBHS) where these tests gradually lose FDR control when the desired FDR nominal level gets tighter. Boxplots and black crosses are FDP distributions and FDRs, respectively. In this experiment, we do 100 replications where we test 200 hypotheses in each replication under the signal proportion being 0.1.
  • Figure 2: FDP distributions (boxplots) and FDRs (black crosses) for multiple FDR control methods in the multiple independent testing setting where m-FAHS and e-FAHS show more robust FDR and FDP controls than any other procedures. Black long dashed lines refer to nominal FDR levels. In this experiment, we do 100 replications where we test 10000 independent hypotheses simultaneously in each replication under the signal proportion ranging from 0.05 to 0.5 and the nominal FDR levels are chosen to be 0.1 and 0.12. A comprehensive figure for the multiple independent testing setting can be found in Figure 9.
  • Figure 3: FDP distributions (boxplots) and FDRs (black crosses) for multiple FDR control methods in the multiple correlated testing setting (equicorrelation structure with equal correlation 0.3) where m-FAHS and e-FAHS show more robust FDR and FDP controls than any other procedures. Black vertical dashed lines refer to nominal FDR levels. In this experiment, we do 100 replications where we test 10000 correlated hypotheses simultaneously in each replication under the signal proportion ranging from 0.05 to 0.5 and the nominal FDR levels are chosen to be 0.1 and 0.12. Comprehensive figures for the multiple correlated testing setting can be found from Figure 10 to Figure 12.
  • Figure 4: Empirical means of global shrinkage parameter $\xi$'s for four horseshoe procedures across 100 replications in the multiple independent testing setting.
  • Figure 5: Top left: No prior-data conflict (i.e. horseshoe prior and data fit with each other) and the corresponding FDP is under control; Top right: Prior-data conflict exists and the corresponding FDP loses control greatly; Bottom: FDPs lose control more greatly when prior-data conflicts are more severe. The black dashed line is the pre-specified threshold for prior data conflicts. There is a noticeable increase for FDPs when the tail probabilities get lower and lower than the threshold.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Theorem 1