DART2: a robust multiple testing method to smartly leverage helpful or misleading ancillary information
Xuechan Li, Jichun Xie
TL;DR
DART2 tackles robustly leveraging ancillary information in large-scale multiple testing by a two-stage procedure that screens hypotheses through an aggregation-tree representation and then refines results on screened-out nodes. It does not rely on correct priors about ancillary information; asymptotic $FDR$ control is guaranteed, and power is at least as good as BH when ancillary information is uninformative, with potential gains when informative. The method computes node-level statistics via $T_S$ using a Stouffer aggregation across tree nodes and employs a robust refining threshold to guard against misleading ancillary information, yielding strong performance in simulations and a breast-cancer gene-expression application. Empirical results demonstrate robust $FDR$ control, improved power over competing methods, and substantial computational efficiency, highlighting DART2’s practical value for genomics and other domains with heterogeneous ancillary information.
Abstract
In many applications of multiple testing, ancillary information is available, reflecting the hypothesis null or alternative status. Several methods have been developed to leverage this ancillary information to enhance testing power, typically requiring the ancillary information is helpful enough to ensure favorable performance. In this paper, we develop a robust and effective distance-assisted multiple testing procedure named DART2, designed to be powerful and robust regardless of the quality of ancillary information. When the ancillary information is helpful, DART2 can asymptotically control FDR while improving power; otherwise, DART2 can still control FDR and maintain power at least as high as ignoring the ancillary information. We demonstrated DART2's superior performance compared to existing methods through numerical studies under various settings. In addition, DART2 has been applied to a gene association study where we have shown its superior accuracy and robustness under two different types of ancillary information.
