Table of Contents
Fetching ...

Adaptive procedures for boundary FDR control

Sarah Mostow, Daniel Xiang

Abstract

A cornerstone of the multiple testing literature is the Benjamini-Hochberg (BH) procedure, which guarantees control of the FDR when $p$-values are independent or positively dependent. While BH controls the average quality of rejections, it does not provide guarantees for individual discoveries, particularly those near the rejection threshold, which are more likely to be false than the average rejection. For independent $p$-values with Uniform$(0,1)$ null distribution, the Support Line procedure (SL; arXiv:2207.07299) provably controls the error probability for the rejection at the edge of the discovery set (i.e. the one with largest $p$-value) at level $q m_0/m$, where $m_0$ is the number of true null hypotheses and $q$ is a tuning parameter. In this work, we study adaptive versions of the SL procedure that operate in two steps: the first step estimates $m_0$ from non-significant statistics, and the second step runs the SL procedure at an adjusted level $q m / \hat{m}_0$. The adaptive procedures are shown to control the false discovery probability for the "boundary'' rejection under an independence assumption. Simulation studies suggest that some but not all of the two-stage procedures maintain error control under positive dependence, and that substantial power is gained relative to the original SL procedure. We illustrate differences between the procedures on meta-data from the recent literature in behavioral psychology on growth mindset and nudge interventions.

Adaptive procedures for boundary FDR control

Abstract

A cornerstone of the multiple testing literature is the Benjamini-Hochberg (BH) procedure, which guarantees control of the FDR when -values are independent or positively dependent. While BH controls the average quality of rejections, it does not provide guarantees for individual discoveries, particularly those near the rejection threshold, which are more likely to be false than the average rejection. For independent -values with Uniform null distribution, the Support Line procedure (SL; arXiv:2207.07299) provably controls the error probability for the rejection at the edge of the discovery set (i.e. the one with largest -value) at level , where is the number of true null hypotheses and is a tuning parameter. In this work, we study adaptive versions of the SL procedure that operate in two steps: the first step estimates from non-significant statistics, and the second step runs the SL procedure at an adjusted level . The adaptive procedures are shown to control the false discovery probability for the "boundary'' rejection under an independence assumption. Simulation studies suggest that some but not all of the two-stage procedures maintain error control under positive dependence, and that substantial power is gained relative to the original SL procedure. We illustrate differences between the procedures on meta-data from the recent literature in behavioral psychology on growth mindset and nudge interventions.

Paper Structure

This paper contains 28 sections, 8 theorems, 62 equations, 11 figures, 3 tables.

Key Result

Theorem 3.1

Let $H_1,\dots,H_m$ denote $m$ null hypotheses, with independent $p$-values $p_1,\dots,p_m$. Suppose that $p_i \sim \text{Uniform}(0,1)$ if $H_i$ is true. Then where $\mathcal{R}_2 \coloneqq \{i : p_i \leq p_{(R_2)}\}$ is the rejection set for the two stage procedure, and the $\textnormal{bFDR}$ of a rejection set is defined in def:bfdr.

Figures (11)

  • Figure 1: In the left panel, the TST$(q')$, SL$(\ell)$ and TSSL$(\ell')$ procedures are illustrated on an educational dataset from williams1999controlling, with $q=0.1$ and $\ell=0.2$. In the right panel, we plot the bFDR curve for the procedures in the "1234 configuration" simulation setting of benjamini2006adaptive, where $\pi_0=1/2$ and non-null $p$-values are computed from a one-sided Gaussian location test with mean $\mu$ equal to 1, 2, 3, or 4 with equal weight.
  • Figure 2: Boundary FDR versus tuning parameter for the alternating configuration (independent Gaussian test statistics), $N = 10{,}000$ simulations, $m = 64$$p$-values, with $\pi_0 = 0.75$ (left) and $\pi_0 = 0.5$ (right).
  • Figure 3: Boundary FDR versus tuning parameter for the all-at-5 configuration with $N = 10{,}000$ simulations, $m = 64$ p-values, with $\pi_0 = 0.75$ (left) and $\pi_0 = 0.5$ (right).
  • Figure 4: Boundary FDR (lines) and $\hat{\pi}_0$ (boxplots) versus correlation parameter for the alternating configuration (equicorrelated Gaussian test statistics), $N = 10{,}000$ simulations, $m = 64$p-values, $q = 0.2$, and $\pi_0 = 0.75$.
  • Figure 5: Power for each procedure relative to oracle SL, using the alternating configuration with $\pi_0 = 0.25, 0.5, 0.75$, $m = 16, 64, 256$, and $q = 0.2$. $N = 10,000$ simulations.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Theorem 3.1
  • Remark 3.1
  • Proposition 3.1
  • proof
  • proof
  • Lemma 7.1
  • proof
  • Lemma 7.2: Adapted from Lemma 1 in benjamini2006adaptive
  • proof
  • Lemma 7.3: Lemma 2 in soloff2024edge
  • ...and 7 more