Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

Tomoya Wakayama; Masaaki Imaizumi

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

Tomoya Wakayama, Masaaki Imaizumi

TL;DR

The paper tackles high-dimensional linear regression with $p \gg n$ under non-sparse $\boldsymbol{\theta}^*$ by introducing a data-adaptive Gaussian prior concentrated on the leading eigen-directions of the covariate covariance. A hierarchical prior selects the effective rank $k$, and a Bernstein–von Mises-type truncation yields a Gaussian approximation to the posterior that facilitates uncertainty quantification with reduced computation. Theoretical contributions include posterior contraction rates tied to spectral quantities (effective ranks) and a robust Gaussian-approximation result that holds beyond sub-Gaussian covariates. Empirically, simulations and a real-data analysis demonstrate accurate prediction and well-calibrated uncertainty, illustrating the practical value of leveraging spectral information in non-sparse Bayesian high-dimensional settings.

Abstract

In high-dimensional Bayesian statistics, various methods have been developed, including prior distributions that induce parameter sparsity to handle many parameters. Yet, these approaches often overlook the rich spectral structure of the covariate matrix, which can be crucial when true signals are not sparse. To address this gap, we introduce a data-adaptive Gaussian prior whose covariance is aligned with the leading eigenvectors of the sample covariance. This prior design targets the data's intrinsic complexity rather than its ambient dimension by concentrating the parameter search along principal data directions. We establish contraction rates of the corresponding posterior distribution, which reveal how the mass in the spectrum affects the prediction error bounds. Furthermore, we derive a truncated Gaussian approximation to the posterior (i.e., a Bernstein-von Mises-type result), which allows for uncertainty quantification with a reduced computational burden. Our findings demonstrate that Bayesian methods leveraging spectral information of the data are effective for estimation in non-sparse, high-dimensional settings.

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

TL;DR

The paper tackles high-dimensional linear regression with

under non-sparse

by introducing a data-adaptive Gaussian prior concentrated on the leading eigen-directions of the covariate covariance. A hierarchical prior selects the effective rank

, and a Bernstein–von Mises-type truncation yields a Gaussian approximation to the posterior that facilitates uncertainty quantification with reduced computation. Theoretical contributions include posterior contraction rates tied to spectral quantities (effective ranks) and a robust Gaussian-approximation result that holds beyond sub-Gaussian covariates. Empirically, simulations and a real-data analysis demonstrate accurate prediction and well-calibrated uncertainty, illustrating the practical value of leveraging spectral information in non-sparse Bayesian high-dimensional settings.

Abstract

Paper Structure (35 sections, 15 theorems, 122 equations, 6 figures)

This paper contains 35 sections, 15 theorems, 122 equations, 6 figures.

Introduction
Overview
Related Works
Notation
Setup and Proposed Method
Setup
Preparation: Empirical Estimation of Covariance Matrix
Prior Design
Posterior
Theoretical Guarantee
Baseline Case: sub-Gauss and trace-class assumptions
Posterior Contraction in Baseline Case
Advanced Case: Beyond sub-Gauss and trace-class assumptions
Posterior Contraction in Advanced Case
Approximation for Posterior Distribution
...and 20 more sections

Key Result

Theorem 1

Consider the regression model eqn-model and the posterior distributions of $\bm{\theta}$ and $\sigma^2$ with $R\le\infty$. Suppose that Assumptions ass:DGP and ass:trace hold and $R$ satisfies $\| \bm{\theta}^* \|_{\Sigma} < R/2$ and $\| \bm{\theta}^* \|_2 < \infty$. For any sequence $\{\varepsilon_ we have the following as $n \to \infty$, for some constant $C>0$: Additionally, for the posterior

Figures (6)

Figure 1: Left: 3D isotropic normal distribution. Right: Proposed distribution. Red, blue, and green arrows represent the first to third principal components of the black data points in 3D space. The proposed distribution assigns weights along the principal component directions proportional to the eigenvalues.
Figure 2: An illustration of the support space $S_{\mathcal{D}_1,k}$ of the prior distribution and an ellipse centred at $\bm{\beta}^*$, in the $p$-dimensional space $\mathcal{P}_n^1$ equipped with the metric induced by the norm $\|\cdot\|_{\mathrm{diag}(\widetilde{\lambda}_1,\ldots,\widetilde{\lambda}_p)}$.
Figure 3: Left and right figures show the histograms of the predictive risks for $n=50,\ 100, \ 200, \ 400,$ and $800$ in scenarios (i) and (ii), respectively.
Figure 4: Circles and triangles respectively show samples by the Gibbs sampler and samples from approximate distributions of the two components of $\bm{\theta}$ with the largest variances in Scenario (i) (Scenario (ii)) for $n=50,\ 100,\ 200,\ 400,$ and $800$ if a panel is in the top row (bottom row). The solid curve corresponds to the density estimate for the circle samples, while the dashed curve corresponds to the density estimate for the triangle samples.
Figure 5: Plot of the posterior predictive mean (blue points), 95% posterior predictive interval (bars), and true values (red points) for 100 randomly selected samples from the test data.
...and 1 more figures

Theorems & Definitions (34)

Remark 1: Cross-fitting
Definition 1: Effective Ranks (Def. 3 of bartlett2020benign)
Theorem 1
Example 1
Example 2
Theorem 2
Theorem 3
Theorem 4
Example 3
Example 4
...and 24 more

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

TL;DR

Abstract

Bayesian Analysis for Over-parameterized Linear Model via Effective Spectra

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (34)