Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values

Jinyuan Chang; Chenlong Li; Cheng Yong Tang; Zhengtian Zhu

Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values

Jinyuan Chang, Chenlong Li, Cheng Yong Tang, Zhengtian Zhu

TL;DR

The paper addresses FDR control in high-dimensional linear models by integrating model-X knockoffs with debiased penalized regression to produce valid $p$-values. It develops two paired test-statistic streams $(t_{1,j}, t_{2,j})$ from a debiased augmented model and proves asymptotic normality and independence, enabling both the standard BH procedure and a two-step Bonferroni–BH approach to improve power. The authors establish rigorous FDR control under unknown dependence and demonstrate superior power, particularly in low-signal, small-sample settings, through extensive simulations and a real-data HIV mutation analysis. The methodology relies on CLIME for the precision matrix estimation and scaled Lasso for variance, with a debiased estimator built on the augmented design $Z=(X, ilde X)$, and is complemented by practical guidance and public code. Collectively, the work offers a principled, scalable framework for reliable variable selection in high-dimensional inference where dependence structures are complex and not fully known.

Abstract

In this paper, we propose novel multiple testing methods for controlling the false discovery rate (FDR) in the context of high-dimensional linear models. Our development innovatively integrates model-X knockoff techniques with debiased penalized regression estimators. The proposed approach addresses two fundamental challenges in high-dimensional statistical inference: (i) constructing valid test statistics and corresponding $p$-values in solving problems with a diverging number of model parameters, and (ii) ensuring FDR control under complex and unknown dependence structures among test statistics. A central contribution of our methodology lies in the rigorous construction and theoretical analysis of two paired sets of test statistics. Based on these test statistics, our methodology adopts two $p$-value-based multiple testing algorithms. The first applies the conventional Benjamini-Hochberg procedure, justified by the asymptotic mutual independence and normality of one set of the test statistics. The second leverages the paired structure of both sets of test statistics to improve detection power while maintaining rigorous FDR control. We provide comprehensive theoretical analysis, establishing the validity of the debiasing framework and ensuring that the proposed methods achieve proper FDR control. Extensive simulation studies demonstrate that our procedures outperform existing approaches - particularly those relying on empirical evaluations of false discovery proportions - in terms of both power and empirical control of the FDR. Notably, our methodology yields substantial improvements in settings characterized by weaker signals, smaller sample sizes, and lower pre-specified FDR levels.

Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values

TL;DR

The paper addresses FDR control in high-dimensional linear models by integrating model-X knockoffs with debiased penalized regression to produce valid

-values. It develops two paired test-statistic streams

from a debiased augmented model and proves asymptotic normality and independence, enabling both the standard BH procedure and a two-step Bonferroni–BH approach to improve power. The authors establish rigorous FDR control under unknown dependence and demonstrate superior power, particularly in low-signal, small-sample settings, through extensive simulations and a real-data HIV mutation analysis. The methodology relies on CLIME for the precision matrix estimation and scaled Lasso for variance, with a debiased estimator built on the augmented design

, and is complemented by practical guidance and public code. Collectively, the work offers a principled, scalable framework for reliable variable selection in high-dimensional inference where dependence structures are complex and not fully known.

Abstract

-values in solving problems with a diverging number of model parameters, and (ii) ensuring FDR control under complex and unknown dependence structures among test statistics. A central contribution of our methodology lies in the rigorous construction and theoretical analysis of two paired sets of test statistics. Based on these test statistics, our methodology adopts two

-value-based multiple testing algorithms. The first applies the conventional Benjamini-Hochberg procedure, justified by the asymptotic mutual independence and normality of one set of the test statistics. The second leverages the paired structure of both sets of test statistics to improve detection power while maintaining rigorous FDR control. We provide comprehensive theoretical analysis, establishing the validity of the debiasing framework and ensuring that the proposed methods achieve proper FDR control. Extensive simulation studies demonstrate that our procedures outperform existing approaches - particularly those relying on empirical evaluations of false discovery proportions - in terms of both power and empirical control of the FDR. Notably, our methodology yields substantial improvements in settings characterized by weaker signals, smaller sample sizes, and lower pre-specified FDR levels.

Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values

TL;DR

Abstract

Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (5)