Table of Contents
Fetching ...

Bayesian inference in high-dimensional models

Sayantan Banerjee, Ismaël Castillo, Subhashis Ghosal

TL;DR

This survey synthesizes Bayesian approaches for high-dimensional problems where sparsity and structure are essential. It surveys priors that encode sparsity (spike-and-slab and continuous shrinkage) across sequence models, regression, and graphical models, and analyzes posterior contraction and uncertainty quantification under these priors. It then covers learning structural relationships among many variables, including covariance/precision estimation, graphical models (Ising, Poisson, nonparanormal), discriminant analysis, and matrix models (SBM, matrix completion), together with computational tools (MCMC, variational Bayes, Laplace approximations) and modern extensions like variational-PAC-Bayes links and sparse projection posteriors. The work highlights both theoretical guarantees (contraction, selection consistency, credible set coverage) and practical algorithms for scalable Bayesian inference in high-dimensional settings, with broad applicability to genomics, networks, and complex structured data.

Abstract

Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical model, where variables are represented by nodes on a graph and an edge between two nodes is used to indicate their conditional dependence given other variables. Many procedures for making inferences in the high-dimensional setting, typically using penalty functions to induce sparsity in the solution obtained by minimizing a loss function, were developed. Bayesian methods have been proposed for such problems more recently, where the prior takes care of the sparsity structure. These methods have the natural ability to also automatically quantify the uncertainty of the inference through the posterior distribution. Theoretical studies of Bayesian procedures in high-dimension have been carried out recently. Questions that arise are, whether the posterior distribution contracts near the true value of the parameter at the minimax optimal rate, whether the correct lower-dimensional structure is discovered with high posterior probability, and whether a credible region has adequate frequentist coverage. In this paper, we review these properties of Bayesian and related methods for several high-dimensional models such as many normal means problem, linear regression, generalized linear models, Gaussian and non-Gaussian graphical models. Effective computational approaches are also discussed.

Bayesian inference in high-dimensional models

TL;DR

This survey synthesizes Bayesian approaches for high-dimensional problems where sparsity and structure are essential. It surveys priors that encode sparsity (spike-and-slab and continuous shrinkage) across sequence models, regression, and graphical models, and analyzes posterior contraction and uncertainty quantification under these priors. It then covers learning structural relationships among many variables, including covariance/precision estimation, graphical models (Ising, Poisson, nonparanormal), discriminant analysis, and matrix models (SBM, matrix completion), together with computational tools (MCMC, variational Bayes, Laplace approximations) and modern extensions like variational-PAC-Bayes links and sparse projection posteriors. The work highlights both theoretical guarantees (contraction, selection consistency, credible set coverage) and practical algorithms for scalable Bayesian inference in high-dimensional settings, with broad applicability to genomics, networks, and complex structured data.

Abstract

Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical model, where variables are represented by nodes on a graph and an edge between two nodes is used to indicate their conditional dependence given other variables. Many procedures for making inferences in the high-dimensional setting, typically using penalty functions to induce sparsity in the solution obtained by minimizing a loss function, were developed. Bayesian methods have been proposed for such problems more recently, where the prior takes care of the sparsity structure. These methods have the natural ability to also automatically quantify the uncertainty of the inference through the posterior distribution. Theoretical studies of Bayesian procedures in high-dimension have been carried out recently. Questions that arise are, whether the posterior distribution contracts near the true value of the parameter at the minimax optimal rate, whether the correct lower-dimensional structure is discovered with high posterior probability, and whether a credible region has adequate frequentist coverage. In this paper, we review these properties of Bayesian and related methods for several high-dimensional models such as many normal means problem, linear regression, generalized linear models, Gaussian and non-Gaussian graphical models. Effective computational approaches are also discussed.

Paper Structure

This paper contains 40 sections, 45 equations.