Table of Contents
Fetching ...

Incorporating priors in learning: a random matrix study under a teacher-student framework

Malik Tiomoko, Ekkehard Schnoor

TL;DR

The paper tackles how informative Gaussian priors affect generalization in high-dimensional MAP regression under proportional asymptotics. It develops exact asymptotic risk formulas via random matrix theory, revealing a bias–variance–prior tradeoff, explaining double descent, and quantifying prior mismatch. It provides a closed-form minimizer for the test risk in the identity covariance case, and extends to general covariance with a numerically computable optimal regularization. The results offer theoretical clarity and practical guidance for leveraging domain knowledge in high-dimensional learning, with implications for transfer learning and time-series forecasting. Overall, the work bridges Bayesian priors, classical regularization, and modern asymptotics, delivering actionable insights and robust estimators for regularization parameters.

Abstract

Regularized linear regression is central to machine learning, yet its high-dimensional behavior with informative priors remains poorly understood. We provide the first exact asymptotic characterization of training and test risks for maximum a posteriori (MAP) regression with Gaussian priors centered at a domain-informed initialization. Our framework unifies ridge regression, least squares, and prior-informed estimators, and -- using random matrix theory -- yields closed-form risk formulas that expose the bias-variance-prior tradeoff, explain double descent, and quantify prior mismatch. We also identify a closed-form minimizer of test risk, enabling a simple estimator of the optimal regularization parameter. Simulations confirm the theory with high accuracy. By connecting Bayesian priors, classical regularization, and modern asymptotics, our results provide both conceptual clarity and practical guidance for learning with structured prior knowledge.

Incorporating priors in learning: a random matrix study under a teacher-student framework

TL;DR

The paper tackles how informative Gaussian priors affect generalization in high-dimensional MAP regression under proportional asymptotics. It develops exact asymptotic risk formulas via random matrix theory, revealing a bias–variance–prior tradeoff, explaining double descent, and quantifying prior mismatch. It provides a closed-form minimizer for the test risk in the identity covariance case, and extends to general covariance with a numerically computable optimal regularization. The results offer theoretical clarity and practical guidance for leveraging domain knowledge in high-dimensional learning, with implications for transfer learning and time-series forecasting. Overall, the work bridges Bayesian priors, classical regularization, and modern asymptotics, delivering actionable insights and robust estimators for regularization parameters.

Abstract

Regularized linear regression is central to machine learning, yet its high-dimensional behavior with informative priors remains poorly understood. We provide the first exact asymptotic characterization of training and test risks for maximum a posteriori (MAP) regression with Gaussian priors centered at a domain-informed initialization. Our framework unifies ridge regression, least squares, and prior-informed estimators, and -- using random matrix theory -- yields closed-form risk formulas that expose the bias-variance-prior tradeoff, explain double descent, and quantify prior mismatch. We also identify a closed-form minimizer of test risk, enabling a simple estimator of the optimal regularization parameter. Simulations confirm the theory with high accuracy. By connecting Bayesian priors, classical regularization, and modern asymptotics, our results provide both conceptual clarity and practical guidance for learning with structured prior knowledge.

Paper Structure

This paper contains 6 sections, 2 theorems, 20 equations, 4 figures.

Key Result

Theorem 1

Let $(\mu_i, {\mathbf{v}}_i), i=1, \dots, d$, be the eigenpairs of the covariance ${\mathbf{\Sigma }} \in \mathbb{R}^{d \times d}$ from eq:Sigma, and with an implicit definition of $\delta$ by the fixed-point equation $f(\delta) = \delta$. Then, under Assumptions assum:concentration and assum:asymptotic, the following risk expressions hold, using the abbreviation $\kappa= \lambda(1+\delta)$.

Figures (4)

  • Figure 1: Test (left) and training (right) risks over $\lambda$. Input $d=100$, output $q=10$, training $n=200$, test $N_\text{test}=10\,000$, $\sigma^2=0.5$, ${\mathbf{\Sigma }}={\mathbf{I}}_d$. Both with and without prior on weights.
  • Figure 2: Contour plots of $R_{\operatorname{test}}$ (left) and $R_{\operatorname{train}}$ (right) versus prior mismatch $\|{\mathbf{\Theta }}_\star - {\mathbf{\Theta }}_0\|$ ($x$-axis) and regularization $\lambda$ ($y$-axis, $\log$-scale); white dashed line: $\lambda$ minimizing $R_{\operatorname{test}}$; $d=100$, $q=10$, $n=200$, $N_\text{test}=10\,000$,$\sigma^2=0.5$, ${\mathbf{\Sigma }}={\mathbf{I}}_d$.
  • Figure 3: Test/training risk (left/right) as a function of $c = d/n$ for different priors and regularization strengths; note the double descent phenomenon around the interpolation threshold.
  • Figure 4: Left: estimation of $\sigma^2$ and $S$ versus $c=d/n$. Right: test risk as a function of $\lambda$, showing both theoretical and empirical curves along with the optimal values of $\lambda^\star$ and $\widehat{\lambda}^\star$.

Theorems & Definitions (4)

  • Definition 1: Training and Out-of-Sample Risk
  • Theorem 1: Asymptotic Risks
  • proof
  • Corollary 1