Two-Stage Multiple Test Procedures Controlling False Discovery Rate with auxiliary variable and their Application to Set4Delta Mutant Data
Seohwa Hwang, Mark Louie Ramos, DoHwan Park, Junyong Park, Johan Lim, Erin Green
TL;DR
This paper addresses improving false discovery rate (FDR) control in multiple testing by leveraging auxiliary information through a copula-based joint model of the primary statistic and an auxiliary variable. It introduces two-stage FDR procedures, Two-Stage FDR(H) and Two-Stage FDR(S), which use hard or soft thresholds on the auxiliary variable to refine testing of the primary variable while maintaining FDR at a pre-specified level. Through simulations and a Set4$\Delta$ yeast dataset, the methods demonstrate higher power than traditional one-stage approaches and many covariate-assisted methods, with robust FDR control even when the copula is misspecified. The work provides practical benefits for gene discovery under stress conditions and offers data and code for reproducibility and broader application to problems with a primary and auxiliary variable.
Abstract
In this paper, we present novel methodologies that incorporate auxiliary variables for multiple hypotheses testing related to the main point of interest while effectively controlling the false discovery rate. When dealing with multiple tests concerning the primary variable of interest, researchers can use auxiliary variables to set preconditions for the significance of primary variables, thereby enhancing test efficacy. Depending on the auxiliary variable's role, we propose two approaches: one terminates testing of the primary variable if it does not meet predefined conditions, and the other adjusts the evaluation criteria based on the auxiliary variable. Employing the copula method, we elucidate the dependence between the auxiliary and primary variables by deriving their joint distribution from individual marginal distributions.Our numerical studies, compared with existing methods, demonstrate that the proposed methodologies effectively control the FDR and yield greater statistical power than previous approaches solely based on the primary variable. As an illustrative example, we apply our methods to the Set4$Δ$ mutant dataset. Our findings highlight the distinctions between our methodologies and traditional approaches, emphasising the potential advantages of our methods in introducing the auxiliary variable for selecting more genes.
