Data-Adaptive Automatic Threshold Calibration for Stability Selection
Martin Huang, Samuel Muller, Garth Tarr
TL;DR
This work tackles the sensitivity of stability selection to the stable-threshold parameter $\pi$ by introducing Exclusion Automatic Threshold Selection (EATS), a data-adaptive procedure that first filters potential noise via an exclusion threshold derived from shuffled data and then identifies the elbow-based $\hat{\pi}$ to form the stable set. The method combines Automatic Threshold Selection (ATS) with an Exclusion Probability Threshold (EPT) to calibrate $\pi$ without manual tuning, while preserving error control under standard exchangeability assumptions. Across extensive artificial and real-data experiments, EATS achieves higher Matthews correlation coefficients and reduced overselection, particularly in high-dimensional settings where $p>n$, and demonstrates robustness with respect to stability-selection procedures. The approach doubles computation due to the shuffled data step but yields a practical, tuning-free default for stability selection, with clear applicability to genomic and proteomic high-dimensional problems and potential extensions to complementary-pairs stability selection.
Abstract
Stability selection has gained popularity as a method for enhancing the performance of variable selection algorithms while controlling false discovery rates. However, achieving these desirable properties depends on correctly specifying the stable threshold parameter, which can be challenging. An arbitrary choice of this parameter can substantially alter the set of selected variables, as the variables' selection probabilities are inherently data-dependent. To address this issue, we propose Exclusion Automatic Threshold Selection (EATS), a data-adaptive algorithm that streamlines stability selection by automating the threshold specification process. EATS initially filters out potential noise variables using an exclusion probability threshold, derived from applying stability selection to a randomly shuffled version of the dataset. Following this, EATS selects the stable threshold parameter using the elbow method, balancing the marginal utility of including additional variables against the risk of selecting superfluous variables. We evaluate our approach through an extensive simulation study, benchmarking across commonly used variable selection algorithms and static stable threshold values.
