Searching for local associations while controlling the false discovery rate

Paula Gablenz; Matteo Sesia; Tianshu Sun; Chiara Sabatti

Searching for local associations while controlling the false discovery rate

Paula Gablenz, Matteo Sesia, Tianshu Sun, Chiara Sabatti

TL;DR

This work addresses heterogeneity in high-dimensional settings by introducing local conditional hypotheses that allow each explanatory variable to have context-specific associations with an outcome across covariate-defined environments. It extends the model-X knockoff filter to adaptive testing (the adaptive Local Knockoff Filter, aLKF), enabling both fixed and data-driven discovery of local associations under FDR control without sample splitting, using a data-cloaking strategy to prevent selection bias. The authors demonstrate the method on simulations and real GWAS data, showing improved localization of causal signals and the ability to identify sex- or environment-specific genetic effects, as illustrated by WHR analysis in UK Biobank. The approach provides a principled, scalable framework for uncovering subgroup-specific mechanisms in heterogeneous data with rigorous error control, holding promise for precision medicine and complex trait genetics.

Abstract

We introduce local conditional hypotheses that express how the relation between explanatory variables and outcomes changes across different contexts, described by covariates. By expanding upon the model-X knockoff filter, we show how to adaptively discover these local associations, all while controlling the false discovery rate. Our enhanced inferences can help explain sample heterogeneity and uncover interactions, making better use of the capabilities offered by modern machine learning models. Specifically, our method is able to leverage any model for the identification of data-driven hypotheses pertaining to different contexts. Then, it rigorously test these hypotheses without succumbing to selection bias. Importantly, our approach is efficient and does not require sample splitting. We demonstrate the effectiveness of our method through numerical experiments and by studying the genetic architecture of Waist-Hip-Ratio across different sexes in the UKBiobank.

Searching for local associations while controlling the false discovery rate

TL;DR

Abstract

Searching for local associations while controlling the false discovery rate

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (14)