Table of Contents
Fetching ...

Valid and efficient possibilistic structure learning in Gaussian linear regression

Ryan Martin, Naomi Singer, Jonathan Williams

TL;DR

This paper addresses the problem of uncertainty in selecting the structure $S$ of a Gaussian linear regression model. It extends the possibilistic inferential model (IM) framework to structure learning by incorporating partial prior information about $S$, yielding calibrated, finite-sample confidence sets for the unknown structure and fully conditional, likelihood-based uncertainty quantification. The authors prove a formal reliability result, demonstrate finite-sample validity through simulations, and validate the approach on two real datasets (prostate cancer and world happiness), showing that the resulting confidence sets align with substantive findings while offering principled calibration absent in standard Bayesian posteriors. The method provides a practical, principled bridge between Bayesian prior information and frequentist calibration, with clear guidance on eliciting partial priors and managing model complexity via a simple exponential penalization scheme. Overall, the work offers a robust framework for reliable structure learning in regression that explicitly accounts for partial prior knowledge and finite-sample performance, enabling more trustworthy post-selection inference in applied settings.

Abstract

A crucial step in fitting a regression model to data is determining the model's structure, i.e., the subset of explanatory variables to be included. However, the uncertainty in this step is often overlooked due to a lack of satisfactory methods. Frequentists have no broadly applicable confidence set constructions for a model's structure, and Bayesian posterior credible sets do not achieve the desired finite-sample coverage. In this paper, we propose an extension of the possibility-theoretic inferential model (IM) framework that offers reliable, data-driven uncertainty quantification about the unknown model structure. This particular extension allows for the inclusion of incomplete prior information about the unknown structure that facilitates regularization. We prove that this new, regularized, possibilistic IM's uncertainty quantification is suitably calibrated relative to the set of joint distributions compatible with the data-generating process and assumed partial prior knowledge about the structure. This implies, among other things, that the derived confidence sets for the unknown model structure attain the nominal coverage probability in finite samples. We provide background and guidance on quantifying prior knowledge in this new context and analyze two benchmark data sets, comparing our results to those obtained by existing methods.

Valid and efficient possibilistic structure learning in Gaussian linear regression

TL;DR

This paper addresses the problem of uncertainty in selecting the structure of a Gaussian linear regression model. It extends the possibilistic inferential model (IM) framework to structure learning by incorporating partial prior information about , yielding calibrated, finite-sample confidence sets for the unknown structure and fully conditional, likelihood-based uncertainty quantification. The authors prove a formal reliability result, demonstrate finite-sample validity through simulations, and validate the approach on two real datasets (prostate cancer and world happiness), showing that the resulting confidence sets align with substantive findings while offering principled calibration absent in standard Bayesian posteriors. The method provides a practical, principled bridge between Bayesian prior information and frequentist calibration, with clear guidance on eliciting partial priors and managing model complexity via a simple exponential penalization scheme. Overall, the work offers a robust framework for reliable structure learning in regression that explicitly accounts for partial prior knowledge and finite-sample performance, enabling more trustworthy post-selection inference in applied settings.

Abstract

A crucial step in fitting a regression model to data is determining the model's structure, i.e., the subset of explanatory variables to be included. However, the uncertainty in this step is often overlooked due to a lack of satisfactory methods. Frequentists have no broadly applicable confidence set constructions for a model's structure, and Bayesian posterior credible sets do not achieve the desired finite-sample coverage. In this paper, we propose an extension of the possibility-theoretic inferential model (IM) framework that offers reliable, data-driven uncertainty quantification about the unknown model structure. This particular extension allows for the inclusion of incomplete prior information about the unknown structure that facilitates regularization. We prove that this new, regularized, possibilistic IM's uncertainty quantification is suitably calibrated relative to the set of joint distributions compatible with the data-generating process and assumed partial prior knowledge about the structure. This implies, among other things, that the derived confidence sets for the unknown model structure attain the nominal coverage probability in finite samples. We provide background and guidance on quantifying prior knowledge in this new context and analyze two benchmark data sets, comparing our results to those obtained by existing methods.

Paper Structure

This paper contains 24 sections, 4 theorems, 55 equations, 7 figures, 2 tables.

Key Result

Theorem 1

Let $\pi_y$ be the marginal possibility contour for $S$ associated with the partial-prior IM, depending implicitly on the prior contour $q$ that quantifies our a priori uncertainty about the full model parameter $\Theta$. Let $\overline\mathsf{R}$ denote the upper joint distribution for $(Y,\Theta)

Figures (7)

  • Figure 1: Plots of two possibility contours based on the probability-to-possibility transform described in the text. Both plots show the hypothesis $H=[2.5,5]$ in red, with the corresponding $\overline{\Pi}(H)$ defined by optimization of the contour over $H$.
  • Figure 2: The (empirical) distribution function $\alpha \mapsto \mathsf{P}\{ \mathsf{Q}_Y^\text{\sc bayes}(H) \leq \alpha\}$ of the Bayes posterior probability assigned to the false hypothesis $H$ defined in the main text.
  • Figure 3: Empirical versus target/nominal coverage for Bayes and IM uncertainty quantification for the experiment described in the main text.
  • Figure 4: Plots of the prior upper expected value $\gamma \mapsto \overline{\text{mean}}(\gamma)$ for two different values of $p$.
  • Figure 5: Plots of the marginal IM contour $k \mapsto \pi_y^\text{marg}(k)$ for the model complexity $K=|S|$ in the prostate cancer data analysis based on four different values of the hyperparameter $\gamma$. The slightly offset gray lines correspond to the prior contour $k \mapsto q_K(k)$.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Corollary 2
  • proof
  • Corollary 3
  • proof
  • proof : Proof of Theorem \ref{['thm:valid']}
  • proof : Proof of Corollary \ref{['cor:uvalid']}