Table of Contents
Fetching ...

Nonparametric methods controlling the median of the false discovery proportion

Jesse Hemerik

Abstract

When testing many hypotheses, often we do not have strong expectations about the directions of the effects. In some situations however, the alternative hypotheses are that the parameters lie in a certain direction or interval, and it is in fact expected that most hypotheses are false. This is often the case when researchers perform multiple noninferiority or equivalence tests, e.g. when testing food safety with metabolite data. The goal is then to use data to corroborate the expectation that most hypotheses are false. We propose a nonparametric multiple testing approach that is powerful in such situations. If the user's expectations are wrong, our approach will still be valid but have low power. Of course all multiple testing methods become more powerful when appropriate one-sided instead of two-sided tests are used, but our approach often has superior power then. The proposed methods are not at all limited to safety testing and can be used for testing hypotheses about various kinds of parameters, such as coefficients of a model. The methods in this paper control the median of the false discovery proportion (FDP), which is the fraction of false discoveries among the rejected hypotheses. This approach is comparable to false discovery rate control, where one ensures that the mean rather than the median of the FDP is small. Our procedures make use of a symmetry property of the test statistics, do not require independence and have finite-sample properties.

Nonparametric methods controlling the median of the false discovery proportion

Abstract

When testing many hypotheses, often we do not have strong expectations about the directions of the effects. In some situations however, the alternative hypotheses are that the parameters lie in a certain direction or interval, and it is in fact expected that most hypotheses are false. This is often the case when researchers perform multiple noninferiority or equivalence tests, e.g. when testing food safety with metabolite data. The goal is then to use data to corroborate the expectation that most hypotheses are false. We propose a nonparametric multiple testing approach that is powerful in such situations. If the user's expectations are wrong, our approach will still be valid but have low power. Of course all multiple testing methods become more powerful when appropriate one-sided instead of two-sided tests are used, but our approach often has superior power then. The proposed methods are not at all limited to safety testing and can be used for testing hypotheses about various kinds of parameters, such as coefficients of a model. The methods in this paper control the median of the false discovery proportion (FDP), which is the fraction of false discoveries among the rejected hypotheses. This approach is comparable to false discovery rate control, where one ensures that the mean rather than the median of the FDP is small. Our procedures make use of a symmetry property of the test statistics, do not require independence and have finite-sample properties.

Paper Structure

This paper contains 49 sections, 18 theorems, 193 equations, 5 figures.

Key Result

Theorem 4.1

Under Assumption asssym, the bound $\tilde{FDP}$ satisfies equation eq:mainpr.

Figures (5)

  • Figure 1: In some situations, e.g. food safety testing, one wants to show that many parameters lie in a certain interval or in a particular direction. In the figure on the left and on the right, the parameter estimates suggest that most parameters lie within the regions $(-1,1)$ and $(-1,\infty)$ respectively. In many such problems, based on prior knowledge, it is already expected that most parameters lie within such a region, and the goal is to corroborate that with data. This paper proposes powerful, nonparametric (or semiparametric) procedures that do this. In Sections \ref{['secnovel']}-\ref{['secet']}, we select indices of parameters that seem to fall within the region, and estimate the number of incorrectly selected indices, i.e. the indices of parameters that in fact lie outside the region. In Section \ref{['secfdx']}, we provide methods that select a set of indices such that the median of the fraction of incorrectly selected indices---i.e., the median of the FDP---stays below some chosen small value $\gamma\in[0,1)$.
  • Figure 2: This figure illustrates the construction of $s^+$. The smaller the target FDP $\gamma$ is chosen to be, the larger---i.e., stricter---the threshold $s^+$ is. In case of directional testing, our method rejects all hypotheses $H_j$ with $T_j>\delta_j+s^+$. In case of equivalence testing, it rejects all hypotheses $H_j$ with $|T_j|<\delta_j-s^+$.
  • Figure 3: The average FDP estimate of our novel estimator $\tilde{FDP}$ from Section \ref{['secnovel']} (solid lines) and SAM (dashed lines) as depending on the fraction $\pi_0$ of true hypotheses, the homogeneous correlation $\rho$ between the test statistics and the effect size $d$. The number of hypotheses was $m=500$. Each estimate is based on $5\cdot 10^3$ simulations.
  • Figure 4: The average FDP estimate of our novel estimator $\tilde{FDP}$ from Section \ref{['secnovel']} (solid lines), SAM (dashed) and SAM+CT (dotted) as depending on the fraction $\pi_0$ of true hypotheses, the correlation $\rho$ between the test statistics and the effect size $d$. The number of hypotheses was taken to be $m=50$, since SAM+CT is computationally infeasible for large $m$. Each estimate is based on $10^3$ simulations.
  • Figure 5: The power of our novel method from Section \ref{['seckorn']} (solid lines) versus Benjamini-Hochberg (dashed), Romano-Wolf (dotted) and the method from hemerik2024flexible (dash-dotted) as depending on the fraction $\pi_0$ of true hypotheses, the correlation $\rho$ between the test statistics and the effect size $d$. The number of hypotheses was $m=500$. Each estimate is based on $5\cdot 10^3$ simulations.

Theorems & Definitions (45)

  • Theorem 4.1
  • Lemma 4.2
  • Proposition 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Theorem 5.1
  • Theorem 5.2
  • Example
  • Theorem 5.3
  • Theorem 5.4
  • ...and 35 more