Valid and efficient possibilistic structure learning in Gaussian linear regression
Ryan Martin, Naomi Singer, Jonathan Williams
TL;DR
This paper addresses the problem of uncertainty in selecting the structure $S$ of a Gaussian linear regression model. It extends the possibilistic inferential model (IM) framework to structure learning by incorporating partial prior information about $S$, yielding calibrated, finite-sample confidence sets for the unknown structure and fully conditional, likelihood-based uncertainty quantification. The authors prove a formal reliability result, demonstrate finite-sample validity through simulations, and validate the approach on two real datasets (prostate cancer and world happiness), showing that the resulting confidence sets align with substantive findings while offering principled calibration absent in standard Bayesian posteriors. The method provides a practical, principled bridge between Bayesian prior information and frequentist calibration, with clear guidance on eliciting partial priors and managing model complexity via a simple exponential penalization scheme. Overall, the work offers a robust framework for reliable structure learning in regression that explicitly accounts for partial prior knowledge and finite-sample performance, enabling more trustworthy post-selection inference in applied settings.
Abstract
A crucial step in fitting a regression model to data is determining the model's structure, i.e., the subset of explanatory variables to be included. However, the uncertainty in this step is often overlooked due to a lack of satisfactory methods. Frequentists have no broadly applicable confidence set constructions for a model's structure, and Bayesian posterior credible sets do not achieve the desired finite-sample coverage. In this paper, we propose an extension of the possibility-theoretic inferential model (IM) framework that offers reliable, data-driven uncertainty quantification about the unknown model structure. This particular extension allows for the inclusion of incomplete prior information about the unknown structure that facilitates regularization. We prove that this new, regularized, possibilistic IM's uncertainty quantification is suitably calibrated relative to the set of joint distributions compatible with the data-generating process and assumed partial prior knowledge about the structure. This implies, among other things, that the derived confidence sets for the unknown model structure attain the nominal coverage probability in finite samples. We provide background and guidance on quantifying prior knowledge in this new context and analyze two benchmark data sets, comparing our results to those obtained by existing methods.
