Table of Contents
Fetching ...

Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression

Tomoya Wakayama

TL;DR

This study explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of the adaptive prior considering the intrinsic spectral structure of the data.

Abstract

The remarkable generalization performance of large-scale models has been challenging the conventional wisdom of the statistical learning theory. Although recent theoretical studies have shed light on this behavior in linear models and nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression models is still lacking. This study explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of the adaptive prior considering the intrinsic spectral structure of the data. Posterior contraction is established for generalized linear and single-neuron models with Lipschitz continuous activation functions, demonstrating the consistency in the predictions of the proposed approach. Moreover, the Bayesian framework enables uncertainty estimation of the predictions. The proposed method was validated via numerical simulations and a real data application, showing its ability to achieve accurate predictions and reliable uncertainty estimates. This work provides a theoretical understanding of the advantages of overparameterization and a principled Bayesian approach to large nonlinear models.

Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression

TL;DR

This study explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of the adaptive prior considering the intrinsic spectral structure of the data.

Abstract

The remarkable generalization performance of large-scale models has been challenging the conventional wisdom of the statistical learning theory. Although recent theoretical studies have shed light on this behavior in linear models and nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression models is still lacking. This study explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of the adaptive prior considering the intrinsic spectral structure of the data. Posterior contraction is established for generalized linear and single-neuron models with Lipschitz continuous activation functions, demonstrating the consistency in the predictions of the proposed approach. Moreover, the Bayesian framework enables uncertainty estimation of the predictions. The proposed method was validated via numerical simulations and a real data application, showing its ability to achieve accurate predictions and reliable uncertainty estimates. This work provides a theoretical understanding of the advantages of overparameterization and a principled Bayesian approach to large nonlinear models.
Paper Structure (25 sections, 2 theorems, 34 equations, 2 figures, 1 table)

This paper contains 25 sections, 2 theorems, 34 equations, 2 figures, 1 table.

Key Result

Theorem 1

Consider the GLM eq:model-expfam with covariate distribution satisfying Assumptions ass:emp-ass:KLKV, and the prior distribution on $\bm{\beta}$ is prior:beta. Then, for a sufficiently large constant $M$, the posterior distribution contracts around the true parameter $\bm{\beta}^*$ as Furthermore, if the eigenvalues of $\Sigma$ satisfy the additional condition $(nk)^{-1} \gtrsim (\rho_n +\lambda_

Figures (2)

  • Figure 1: Change of 0-1 loss, AUC (area under the ROC curve), UM (proportion of unconfident predictions among misclassifications), and CC (proportion of accurate predictions among confident ones) of the proposed method as sample size increases for the cases of logistic regression with Gaussian covariate (left) and Laplace covariate (right). The gray lines represent the results of the principal component regression. The shaded bands represent the 90% interval.
  • Figure 2: Change of RMSE (root mean squared error), CP (coverage probability of $95$% prediction interval), and AL (average length of $95$% prediction interval) as sample size increases in nonlinear Gaussian regression with the Gaussian covariate case (left) and Laplace covariate case (right).

Theorems & Definitions (6)

  • Definition 1: Prior Distribution with Effective Spectra
  • Definition 2: Posterior Distribution
  • Example 1
  • Theorem 1
  • Theorem 2
  • Example 2