Asymptotically-exact selective inference for quantile regression
Yumeng Wang, Snigdha Panigrahi, Xuming He
TL;DR
This work develops an asymptotically-exact selective inference framework for quantile regression after model selection by coupling smoothed quantile regression with external randomization. The authors construct a one-dimensional pivot that accounts for the selection event and yields valid confidence intervals for the effects of selected variables on conditional quantile functions, without relying on strong distributional assumptions. The method leverages all available data for both selection and inference, and it demonstrates superior coverage, shorter interval lengths, and improved variable-selection accuracy compared with data-splitting or naive approaches across simulations and a real birth-weight dataset. The results hold uniformly over a broad class of data-generating distributions, and the approach offers practical scalability and potential extensions to other penalties and nonlinear models.
Abstract
In modern data analysis, it is common to select a model before performing statistical inference. Selective inference tools make adjustments for the model selection process in order to ensure reliable inference post selection. In this paper, we introduce an asymptotic pivot to infer about the effects of selected variables on conditional quantile functions. Utilizing estimators from smoothed quantile regression, our proposed pivot is easy to compute and yields asymptotically-exact selective inference without making strict distributional assumptions about the response variable. At the core of our pivot is the use of external randomization variables, which allows us to utilize all available samples for both selection and inference, without partitioning the data into independent subsets or discarding samples at any step. From simulation studies, we find that: (i) the asymptotic confidence intervals based on our pivot achieve the desired coverage rates, even in cases where sample splitting fails due to insufficient sample size for inference; (ii) our intervals are consistently shorter than those produced by sample splitting across various models and signal settings. We report similar findings when we apply our approach to study risk factors for low birth weights in a publicly accessible dataset of US birth records from 2022.
