Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

Aniket Biswas; Aaditya Ramdas

Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

Aniket Biswas, Aaditya Ramdas

TL;DR

The paper addresses variable selection under linear regression with many predictors by targeting accurate $FDR$ control within the knockoff framework. It develops a unified two-stage framework based on p-to-e calibration, where first-stage $p$-values are transformed into $e$-values to weight the second-stage $p$-values, generalizing Sarkar–Tang's Bon-BH as a special case. It introduces three concrete procedures (Methods 1–3) with exact or asymptotic $FDR$ guarantees and demonstrates through simulations and an HIV-1 drug resistance analysis that these methods improve power while maintaining error control, especially in sparse or weak-signal regimes. The approach suggests practical robustness and potential extensions to GLMs and other error-rate metrics, contributing a flexible, theoretically grounded tool for high-dimensional variable selection and multiple testing in genomics and beyond.

Abstract

Considering the knockoff-based multiple testing framework of Barber and Candès [2015], we revisit the method of Sarkar and Tang [2022] and identify it as a specific case of an un-normalized e-value weighted Benjamini-Hochberg procedure. Building on this insight, we extend the method to use bounded p-to-e calibrators that enable more refined and flexible weight assignments. Our approach generalizes the method of Sarkar and Tang [2022], which emerges as a special case corresponding to an extreme calibrator. Within this framework, we propose three procedures: an e-value weighted Benjamini-Hochberg method, its adaptive extension using an estimate of the proportion of true null hypotheses, and an adaptive weighted Benjamini-Hochberg method. We establish control of the false discovery rate (FDR) for the proposed methods. While we do not formally prove that the proposed methods outperform those of Barber and Candès [2015] and Sarkar and Tang [2022], simulation studies and real-data analysis demonstrate large and consistent improvement over the latter in all cases, and better performance than the knockoff method in scenarios with low target FDR, a small number of signals, and weak signal strength. Simulation studies and a real-data application in HIV-1 drug resistance analysis demonstrate strong finite sample FDR control and exhibit improved, or at least competitive, power relative to the aforementioned methods.

Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

TL;DR

The paper addresses variable selection under linear regression with many predictors by targeting accurate

control within the knockoff framework. It develops a unified two-stage framework based on p-to-e calibration, where first-stage

-values are transformed into

-values to weight the second-stage

-values, generalizing Sarkar–Tang's Bon-BH as a special case. It introduces three concrete procedures (Methods 1–3) with exact or asymptotic

guarantees and demonstrates through simulations and an HIV-1 drug resistance analysis that these methods improve power while maintaining error control, especially in sparse or weak-signal regimes. The approach suggests practical robustness and potential extensions to GLMs and other error-rate metrics, contributing a flexible, theoretically grounded tool for high-dimensional variable selection and multiple testing in genomics and beyond.

Abstract

Paper Structure (11 sections, 2 theorems, 46 equations, 12 figures)

This paper contains 11 sections, 2 theorems, 46 equations, 12 figures.

Keywords:
MSC (2020):
Introduction
Review of related methods
General procedures
Knockoff-filter and knockoff-assisted procedures
Proposed methods
Bounded calibration
Simulation study
Data analysis
Remarks

Key Result

Theorem 3.1

Suppose that the model in (eq:regmodel) holds. Then:

Figures (12)

Figure 1: Comparison of false discovery rate and power across different settings $(n, m, k)$ with $\rho=0.5$ and $\alpha = 0.05$. Each row corresponds to a different simulation setting, and signal strength values ($\gamma = 2, 4, 6, 8, 10$) are shown on the x-axis.
Figure 2: Comparison of false discovery rate and power across different settings $(n, m, k)$ with $\rho=0.5$ and $\alpha = 0.1$. Each row corresponds to a different simulation setting, and signal strength values ($\gamma = 2, 4, 6, 8, 10$) are shown on the x-axis.
Figure 3: Variable selection results at false discovery rate level $\alpha = 0.05$ for each drug. Blue bars indicate positions with prior biological support; orange bars represent novel discoveries.
Figure 4: Variable selection results at false discovery rate level $\alpha = 0.1$ for each drug. Blue bars indicate positions with prior biological support; orange bars represent novel discoveries.
Figure 5: Comparison of false discovery rate and power across different settings $(n, m, k)$ with $\rho=0.1$ and $\alpha = 0.05$. Each row corresponds to a different simulation setting, and signal strength values ($\gamma = 2, 4, 6, 8, 10$) are shown on the x-axis.
...and 7 more figures

Theorems & Definitions (5)

Theorem 3.1
proof : Proof of Theorem \ref{['thm:combined_theorem']}
Lemma 7.1
proof : Proof of Lemma \ref{['lem:asymptotic_pval']}
proof : Proof of Theorem \ref{['thm:combined_theorem']}

Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

TL;DR

Abstract

Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (5)