Table of Contents
Fetching ...

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

Tomoya Wakayama, Masaaki Imaizumi

TL;DR

The paper tackles high-dimensional linear regression with $p \gg n$ under non-sparse $\boldsymbol{\theta}^*$ by introducing a data-adaptive Gaussian prior concentrated on the leading eigen-directions of the covariate covariance. A hierarchical prior selects the effective rank $k$, and a Bernstein–von Mises-type truncation yields a Gaussian approximation to the posterior that facilitates uncertainty quantification with reduced computation. Theoretical contributions include posterior contraction rates tied to spectral quantities (effective ranks) and a robust Gaussian-approximation result that holds beyond sub-Gaussian covariates. Empirically, simulations and a real-data analysis demonstrate accurate prediction and well-calibrated uncertainty, illustrating the practical value of leveraging spectral information in non-sparse Bayesian high-dimensional settings.

Abstract

In high-dimensional Bayesian statistics, various methods have been developed, including prior distributions that induce parameter sparsity to handle many parameters. Yet, these approaches often overlook the rich spectral structure of the covariate matrix, which can be crucial when true signals are not sparse. To address this gap, we introduce a data-adaptive Gaussian prior whose covariance is aligned with the leading eigenvectors of the sample covariance. This prior design targets the data's intrinsic complexity rather than its ambient dimension by concentrating the parameter search along principal data directions. We establish contraction rates of the corresponding posterior distribution, which reveal how the mass in the spectrum affects the prediction error bounds. Furthermore, we derive a truncated Gaussian approximation to the posterior (i.e., a Bernstein-von Mises-type result), which allows for uncertainty quantification with a reduced computational burden. Our findings demonstrate that Bayesian methods leveraging spectral information of the data are effective for estimation in non-sparse, high-dimensional settings.

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

TL;DR

The paper tackles high-dimensional linear regression with under non-sparse by introducing a data-adaptive Gaussian prior concentrated on the leading eigen-directions of the covariate covariance. A hierarchical prior selects the effective rank , and a Bernstein–von Mises-type truncation yields a Gaussian approximation to the posterior that facilitates uncertainty quantification with reduced computation. Theoretical contributions include posterior contraction rates tied to spectral quantities (effective ranks) and a robust Gaussian-approximation result that holds beyond sub-Gaussian covariates. Empirically, simulations and a real-data analysis demonstrate accurate prediction and well-calibrated uncertainty, illustrating the practical value of leveraging spectral information in non-sparse Bayesian high-dimensional settings.

Abstract

In high-dimensional Bayesian statistics, various methods have been developed, including prior distributions that induce parameter sparsity to handle many parameters. Yet, these approaches often overlook the rich spectral structure of the covariate matrix, which can be crucial when true signals are not sparse. To address this gap, we introduce a data-adaptive Gaussian prior whose covariance is aligned with the leading eigenvectors of the sample covariance. This prior design targets the data's intrinsic complexity rather than its ambient dimension by concentrating the parameter search along principal data directions. We establish contraction rates of the corresponding posterior distribution, which reveal how the mass in the spectrum affects the prediction error bounds. Furthermore, we derive a truncated Gaussian approximation to the posterior (i.e., a Bernstein-von Mises-type result), which allows for uncertainty quantification with a reduced computational burden. Our findings demonstrate that Bayesian methods leveraging spectral information of the data are effective for estimation in non-sparse, high-dimensional settings.
Paper Structure (35 sections, 15 theorems, 122 equations, 6 figures)

This paper contains 35 sections, 15 theorems, 122 equations, 6 figures.

Key Result

Theorem 1

Consider the regression model eqn-model and the posterior distributions of $\bm{\theta}$ and $\sigma^2$ with $R\le\infty$. Suppose that Assumptions ass:DGP and ass:trace hold and $R$ satisfies $\| \bm{\theta}^* \|_{\Sigma} < R/2$ and $\| \bm{\theta}^* \|_2 < \infty$. For any sequence $\{\varepsilon_ we have the following as $n \to \infty$, for some constant $C>0$: Additionally, for the posterior

Figures (6)

  • Figure 1: Left: 3D isotropic normal distribution. Right: Proposed distribution. Red, blue, and green arrows represent the first to third principal components of the black data points in 3D space. The proposed distribution assigns weights along the principal component directions proportional to the eigenvalues.
  • Figure 2: An illustration of the support space $S_{\mathcal{D}_1,k}$ of the prior distribution and an ellipse centred at $\bm{\beta}^*$, in the $p$-dimensional space $\mathcal{P}_n^1$ equipped with the metric induced by the norm $\|\cdot\|_{\mathrm{diag}(\widetilde{\lambda}_1,\ldots,\widetilde{\lambda}_p)}$.
  • Figure 3: Left and right figures show the histograms of the predictive risks for $n=50,\ 100, \ 200, \ 400,$ and $800$ in scenarios (i) and (ii), respectively.
  • Figure 4: Circles and triangles respectively show samples by the Gibbs sampler and samples from approximate distributions of the two components of $\bm{\theta}$ with the largest variances in Scenario (i) (Scenario (ii)) for $n=50,\ 100,\ 200,\ 400,$ and $800$ if a panel is in the top row (bottom row). The solid curve corresponds to the density estimate for the circle samples, while the dashed curve corresponds to the density estimate for the triangle samples.
  • Figure 5: Plot of the posterior predictive mean (blue points), 95% posterior predictive interval (bars), and true values (red points) for 100 randomly selected samples from the test data.
  • ...and 1 more figures

Theorems & Definitions (34)

  • Remark 1: Cross-fitting
  • Definition 1: Effective Ranks (Def. 3 of bartlett2020benign)
  • Theorem 1
  • Example 1
  • Example 2
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Example 3
  • Example 4
  • ...and 24 more