Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection
Aniket Biswas, Aaditya Ramdas
TL;DR
The paper addresses variable selection under linear regression with many predictors by targeting accurate $FDR$ control within the knockoff framework. It develops a unified two-stage framework based on p-to-e calibration, where first-stage $p$-values are transformed into $e$-values to weight the second-stage $p$-values, generalizing Sarkar–Tang's Bon-BH as a special case. It introduces three concrete procedures (Methods 1–3) with exact or asymptotic $FDR$ guarantees and demonstrates through simulations and an HIV-1 drug resistance analysis that these methods improve power while maintaining error control, especially in sparse or weak-signal regimes. The approach suggests practical robustness and potential extensions to GLMs and other error-rate metrics, contributing a flexible, theoretically grounded tool for high-dimensional variable selection and multiple testing in genomics and beyond.
Abstract
Considering the knockoff-based multiple testing framework of Barber and Candès [2015], we revisit the method of Sarkar and Tang [2022] and identify it as a specific case of an un-normalized e-value weighted Benjamini-Hochberg procedure. Building on this insight, we extend the method to use bounded p-to-e calibrators that enable more refined and flexible weight assignments. Our approach generalizes the method of Sarkar and Tang [2022], which emerges as a special case corresponding to an extreme calibrator. Within this framework, we propose three procedures: an e-value weighted Benjamini-Hochberg method, its adaptive extension using an estimate of the proportion of true null hypotheses, and an adaptive weighted Benjamini-Hochberg method. We establish control of the false discovery rate (FDR) for the proposed methods. While we do not formally prove that the proposed methods outperform those of Barber and Candès [2015] and Sarkar and Tang [2022], simulation studies and real-data analysis demonstrate large and consistent improvement over the latter in all cases, and better performance than the knockoff method in scenarios with low target FDR, a small number of signals, and weak signal strength. Simulation studies and a real-data application in HIV-1 drug resistance analysis demonstrate strong finite sample FDR control and exhibit improved, or at least competitive, power relative to the aforementioned methods.
