Spike-and-Slab Posterior Sampling in High Dimensions

Syamantak Kumar; Purnamrita Sarkar; Kevin Tian; Yusong Zhu

Spike-and-Slab Posterior Sampling in High Dimensions

Syamantak Kumar, Purnamrita Sarkar, Kevin Tian, Yusong Zhu

TL;DR

The paper tackles provable, scalable posterior sampling for spike-and-slab Bayesian sparse linear regression in high dimensions with underdetermined measurements. It delivers polynomial-time samplers for Gaussian priors and near-linear-time samplers under RIP, achieving total-variation accuracy $D_{TV}(\pi',\pi(\cdot|X,y)) \le \delta$ with sublinear sample requirements $n \ge k^3\mathrm{polylog}(d)$ or $n \ge k^5\mathrm{polylog}(d)$, respectively, and extends to Laplace diffuse densities via annealing. The approach combines denoising via a frequentist estimate, centered rejection sampling with a product-like proposal, and conditional Poisson sampling to explore small supports, with rigorous guarantees on sparsity and posterior concentration. This yields practical, provable Bayesian variable selection in regimes previously out of reach for spike-and-slab methods, including settings where $n=o(d)$ and SNR can be intermediate. The framework offers scalable algorithms and provides a foundation for reliable uncertainty quantification in high-dimensional sparse regression.

Abstract

Posterior sampling with the spike-and-slab prior [MB88], a popular multimodal distribution used to model uncertainty in variable selection, is considered the theoretical gold standard method for Bayesian sparse linear regression [CPS09, Roc18]. However, designing provable algorithms for performing this sampling task is notoriously challenging. Existing posterior samplers for Bayesian sparse variable selection tasks either require strong assumptions about the signal-to-noise ratio (SNR) [YWJ16], only work when the measurement count grows at least linearly in the dimension [MW24], or rely on heuristic approximations to the posterior. We give the first provable algorithms for spike-and-slab posterior sampling that apply for any SNR, and use a measurement count sublinear in the problem dimension. Concretely, assume we are given a measurement matrix $\mathbf{X} \in \mathbb{R}^{n\times d}$ and noisy observations $\mathbf{y} = \mathbf{X}\mathbfθ^\star + \mathbfξ$ of a signal $\mathbfθ^\star$ drawn from a spike-and-slab prior $π$ with a Gaussian diffuse density and expected sparsity k, where $\mathbfξ \sim \mathcal{N}(\mathbb{0}_n, σ^2\mathbf{I}_n)$. We give a polynomial-time high-accuracy sampler for the posterior $π(\cdot \mid \mathbf{X}, \mathbf{y})$, for any SNR $σ^{-1}$ > 0, as long as $n \geq k^3 \cdot \text{polylog}(d)$ and $X$ is drawn from a matrix ensemble satisfying the restricted isometry property. We further give a sampler that runs in near-linear time $\approx nd$ in the same setting, as long as $n \geq k^5 \cdot \text{polylog}(d)$. To demonstrate the flexibility of our framework, we extend our result to spike-and-slab posterior sampling with Laplace diffuse densities, achieving similar guarantees when $σ= O(\frac{1}{k})$ is bounded.

Spike-and-Slab Posterior Sampling in High Dimensions

TL;DR

with sublinear sample requirements

, respectively, and extends to Laplace diffuse densities via annealing. The approach combines denoising via a frequentist estimate, centered rejection sampling with a product-like proposal, and conditional Poisson sampling to explore small supports, with rigorous guarantees on sparsity and posterior concentration. This yields practical, provable Bayesian variable selection in regimes previously out of reach for spike-and-slab methods, including settings where

and SNR can be intermediate. The framework offers scalable algorithms and provides a foundation for reliable uncertainty quantification in high-dimensional sparse regression.

Abstract

and noisy observations

of a signal

drawn from a spike-and-slab prior

with a Gaussian diffuse density and expected sparsity k, where

. We give a polynomial-time high-accuracy sampler for the posterior

, for any SNR

> 0, as long as

and

is drawn from a matrix ensemble satisfying the restricted isometry property. We further give a sampler that runs in near-linear time

in the same setting, as long as

. To demonstrate the flexibility of our framework, we extend our result to spike-and-slab posterior sampling with Laplace diffuse densities, achieving similar guarantees when

is bounded.

Spike-and-Slab Posterior Sampling in High Dimensions

TL;DR

Abstract

Spike-and-Slab Posterior Sampling in High Dimensions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (80)