Learning False Discovery Rate Control via Model-Based Neural Networks
Arnau Vilella, Jasin Machkour, Michael Muma, Daniel P. Palomar
TL;DR
The paper tackles high-dimensional FDR control, where existing provable methods can be overly conservative and demand more power. It introduces a learning-augmented enhancement to the T-Rex Selector by replacing its conservative FDP estimator with a neural network–based estimator that tightens calibration toward the target FDR. Through extensive synthetic-data training (about 1.4 million systems across fourteen distributions) and validation on a GWAS-like genomics dataset, the approach yields substantial gains in true positive rate while maintaining approximate FDR control. This data-driven calibration offers scalable, fair improvements for identifying true signals in genomics-style high-dimensional problems.
Abstract
Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.
