A fast algorithm to compute a curve of confidence upper bounds for the False Discovery Proportion using a reference family with a forest structure
Guillermo Durand
TL;DR
The paper tackles the challenge of computing a full curve of post hoc FDP bounds $V^*_{\mathfrak{R}}(S_t)$ along a nested sequence of hypothesis sets, by exploiting forest-structured reference families. It introduces pruning of the forest and, most importantly, a fast $O(m|\mathcal{K}|)$ curve-computation algorithm that maintains per-region counters and a partition to update the curve efficiently; the core identity $V^*_{\mathfrak{R}}(S_t)=\sum_{k\in\mathcal{P}^t} \zeta_k \wedge |S_t\cap R_k|$ enables linear-time progression along the path. The authors provide a rigorous proof framework for the curve updates, implement the methods in the RR-base package, and demonstrate substantial speedups via numerical experiments across large-scale scenarios. This work significantly improves the practicality of extensive FDP-bound exploration in high-dimensional multiple testing, enabling exact curve computation and broader empirical study of post hoc inference strategies.
Abstract
This paper presents a new algorithm (and an additional trick) that allows to compute fastly an entire curve of post hoc bounds for the False Discovery Proportion when the underlying bound $V^*_{\mathfrak{R}}$ construction is based on a reference family $\mathfrak{R}$ with a forest structure {à} la Durand et al. (2020). By an entire curve, we mean the values $V^*_{\mathfrak{R}}(S_1),\dotsc,V^*_{\mathfrak{R}}(S_m)$ computed on a path of increasing selection sets $S_1\subsetneq\dotsb\subsetneq S_m$, $|S_t|=t$. The new algorithm leverages the fact that going from $S_t$ to $S_{t+1}$ is done by adding only one hypothesis.
