Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient?
Yingying Fan, Lan Gao, Jinchi Lv, Xiaocong Xu
TL;DR
This work develops a unified framework to study the robustness of Model-X knockoffs when the covariate distribution is misspecified, and shows that approximate knockoffs can achieve asymptotic FDR control under three concrete conditions. It formalizes that Gaussian knockoffs created via first-two-moments matching suffice for asymptotic FDR control for common knockoff statistics such as marginal correlation and regression-coefficient-difference statistics, using a general three-condition theorem as the backbone. The paper provides theoretical justification for the practical success of Gaussian knockoffs on non-Gaussian data, and validates the results through simulations and a real HIV drug-resistance dataset. It also discusses extensions to heavy-tailed covariates and offers guidance for practitioners on selecting knockoff statistics and moments-matching strategies to preserve FDR control in finite samples.
Abstract
We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework by investigating the asymptotic false discovery rate (FDR) control of the practically implemented approximate knockoffs procedure. This procedure deviates from the model-X knockoffs framework by substituting the true covariate distribution with a user-specified distribution that can be learned using in-sample observations. By replacing the distributional exchangeability condition of the model-X knockoff variables with three conditions on the approximate knockoff statistics, we establish that the approximate knockoffs procedure achieves the asymptotic FDR control. Using our unified framework, we further prove that an arguably most popularly used knockoff variable generation method--the Gaussian knockoffs generator based on the first two moments matching--achieves the asymptotic FDR control when the two-moment-based knockoff statistics are employed in the knockoffs inference procedure. For the first time in the literature, our theoretical results justify formally the effectiveness and robustness of the Gaussian knockoffs generator. Simulation and real data examples are conducted to validate the theoretical findings.
