Table of Contents
Fetching ...

Learning False Discovery Rate Control via Model-Based Neural Networks

Arnau Vilella, Jasin Machkour, Michael Muma, Daniel P. Palomar

TL;DR

The paper tackles high-dimensional FDR control, where existing provable methods can be overly conservative and demand more power. It introduces a learning-augmented enhancement to the T-Rex Selector by replacing its conservative FDP estimator with a neural network–based estimator that tightens calibration toward the target FDR. Through extensive synthetic-data training (about 1.4 million systems across fourteen distributions) and validation on a GWAS-like genomics dataset, the approach yields substantial gains in true positive rate while maintaining approximate FDR control. This data-driven calibration offers scalable, fair improvements for identifying true signals in genomics-style high-dimensional problems.

Abstract

Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.

Learning False Discovery Rate Control via Model-Based Neural Networks

TL;DR

The paper tackles high-dimensional FDR control, where existing provable methods can be overly conservative and demand more power. It introduces a learning-augmented enhancement to the T-Rex Selector by replacing its conservative FDP estimator with a neural network–based estimator that tightens calibration toward the target FDR. Through extensive synthetic-data training (about 1.4 million systems across fourteen distributions) and validation on a GWAS-like genomics dataset, the approach yields substantial gains in true positive rate while maintaining approximate FDR control. This data-driven calibration offers scalable, fair improvements for identifying true signals in genomics-style high-dimensional problems.

Abstract

Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.
Paper Structure (6 sections, 6 equations, 4 figures)

This paper contains 6 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: T-Rex Selector framework machkour2025trex provides a provably conservative estimate $\widehat{\mathop{\mathrm{FDP}}\nolimits}$. This may lead to a considerable gap between the actual FDP and the target FDR level $\alpha$, which results in a potentially reduced TPP.
  • Figure 2: T-Rex Selector framework with the proposed enhancement using a neural network.
  • Figure 3: Comparison of original T-Rex Selector and our learning-based enhancement for different test SNR levels. Neural network is trained with multiple distributions, tested on the held-out Gaussian mixture models.
  • Figure 4: Average $\widehat{\text{FDP}}_{L=p}(v, T)$ produced by the neural network compared to the real $\text{FDP}$. This case achieves overestimation at all points, guaranteeing $\text{FDR}$ control.