Table of Contents
Fetching ...

An exploration of sequential Bayesian variable selection -- A comment on García-Donato et al. (2025). "Model uncertainty and missing data: An objective Bayesian perspective"

Sebastian Arnold, Alexander Ly

TL;DR

The paper investigates sequential Bayesian variable selection under missing data by extending the GCCQF framework with Sequential Model Confidence Sets (SMCS) to monitor evidence as data accumulate. It formalizes SMCS as a time-dependent model-collection method that guarantees coverage via a bound on model-wins and derives SMCS-based inclusion probabilities for covariates, providing a bridge between frequentist sequential guarantees and Bayesian inclusion concepts. Through simulations on a linear-regression data-generating process, the study shows that SMCS can stabilize posterior inclusion behavior and reduce fluctuations, especially when combined with GCCQF (the mixed approach), albeit with some increased risk of misclassifying inactive covariates late in the sequence. The work highlights the potential of safe sequential inference for Bayesian variable selection and outlines avenues for tuning, adaptivity, and theoretical connections to Bayesian decision-making.

Abstract

Our comment on García-Donato et al. (2025). "Model uncertainty and missing data: An objective Bayesian perspective" explores a further extension of the proposed methodology. Specifically, we consider the sequential setting where (potentially missing) data accumulate over time, with the goal of continuously monitoring statistical evidence, as opposed to assessing it only once data collection terminates. We explore a new variable selection method based on sequential model confidence sets, as proposed by Arnold et al. (2024), and show that it can help stabilise the inference of García-Donato et al. (2025). To be published as "Invited discussion" in Bayesian Analysis.

An exploration of sequential Bayesian variable selection -- A comment on García-Donato et al. (2025). "Model uncertainty and missing data: An objective Bayesian perspective"

TL;DR

The paper investigates sequential Bayesian variable selection under missing data by extending the GCCQF framework with Sequential Model Confidence Sets (SMCS) to monitor evidence as data accumulate. It formalizes SMCS as a time-dependent model-collection method that guarantees coverage via a bound on model-wins and derives SMCS-based inclusion probabilities for covariates, providing a bridge between frequentist sequential guarantees and Bayesian inclusion concepts. Through simulations on a linear-regression data-generating process, the study shows that SMCS can stabilize posterior inclusion behavior and reduce fluctuations, especially when combined with GCCQF (the mixed approach), albeit with some increased risk of misclassifying inactive covariates late in the sequence. The work highlights the potential of safe sequential inference for Bayesian variable selection and outlines avenues for tuning, adaptivity, and theoretical connections to Bayesian decision-making.

Abstract

Our comment on García-Donato et al. (2025). "Model uncertainty and missing data: An objective Bayesian perspective" explores a further extension of the proposed methodology. Specifically, we consider the sequential setting where (potentially missing) data accumulate over time, with the goal of continuously monitoring statistical evidence, as opposed to assessing it only once data collection terminates. We explore a new variable selection method based on sequential model confidence sets, as proposed by Arnold et al. (2024), and show that it can help stabilise the inference of García-Donato et al. (2025). To be published as "Invited discussion" in Bayesian Analysis.

Paper Structure

This paper contains 4 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Posterior inclusion probabilities of the active (green) and inactive (brown) covariates over time for two runs of the simulation with respect to the sequential application of GCCQF (top), the SMCS-based approach (bottom), and the mixture approach (middle), where $\alpha = 0.1$ and $\lambda = 1/(8 \varsigma^{2}) \approx 0.3$, for $\varsigma=0.65\leq \sigma$. Thick lines indicate the active covariates $x_{2}$ and $x_{7}$ with larger regression coefficients.
  • Figure 2: The averaged total number of crossings through the critical value $0.5$ for the posterior inclusion probabilities (red) and the mixture approach (blue).