Table of Contents
Fetching ...

Variational Gaussian Approximation in Replica Analysis of Parametric Models

Takashi Takahashi

TL;DR

This work extends the replica method to parametric inference with unknown data distributions by introducing a grand canonical replicated system and a variational Gaussian approximation (VGA) that adaptively tunes a quadratic trial Hamiltonian per dataset.The method yields tractable stationarity conditions for the VGA parameters $\bm{m}$, $q$, and $\chi$, decomposes estimator fluctuations into quenched and thermal components, and connects to information criteria such as PCIC/WAIC through a controlled expansion.Applied to linear regression, the approach produces learning curves and generalization predictions that remain accurate on synthetic teacher–student data as well as real-world data (Year Prediction MSD), even when $p_{\rm data}$ is unknown.Overall, the GC-VGA framework enables finite-size, data-distribution-agnostic analysis of parametric models and offers a path toward analyzing more complex models while preserving links to classical statistical criteria.

Abstract

We revisit the replica method for analyzing inference and learning in parametric models, considering situations where the data-generating distribution is unknown or analytically intractable. Instead of assuming idealized distributions to carry out quenched averages analytically, we use a variational Gaussian approximation for the replicated system in grand canonical formalism in which the data average can be deferred and replaced by empirical averages, leading to stationarity conditions that adaptively determine the parameters of the trial Hamiltonian for each dataset. This approach clarifies how fluctuations affect information extraction and connects directly with the results of mathematical statistics or learning theory such as information criteria. As a concrete application, we analyze linear regression and derive learning curves. This includes cases with real-world datasets, where exact replica calculations are not feasible.

Variational Gaussian Approximation in Replica Analysis of Parametric Models

TL;DR

This work extends the replica method to parametric inference with unknown data distributions by introducing a grand canonical replicated system and a variational Gaussian approximation (VGA) that adaptively tunes a quadratic trial Hamiltonian per dataset.The method yields tractable stationarity conditions for the VGA parameters $\bm{m}$, $q$, and $\chi$, decomposes estimator fluctuations into quenched and thermal components, and connects to information criteria such as PCIC/WAIC through a controlled expansion.Applied to linear regression, the approach produces learning curves and generalization predictions that remain accurate on synthetic teacher–student data as well as real-world data (Year Prediction MSD), even when $p_{\rm data}$ is unknown.Overall, the GC-VGA framework enables finite-size, data-distribution-agnostic analysis of parametric models and offers a path toward analyzing more complex models while preserving links to classical statistical criteria.

Abstract

We revisit the replica method for analyzing inference and learning in parametric models, considering situations where the data-generating distribution is unknown or analytically intractable. Instead of assuming idealized distributions to carry out quenched averages analytically, we use a variational Gaussian approximation for the replicated system in grand canonical formalism in which the data average can be deferred and replaced by empirical averages, leading to stationarity conditions that adaptively determine the parameters of the trial Hamiltonian for each dataset. This approach clarifies how fluctuations affect information extraction and connects directly with the results of mathematical statistics or learning theory such as information criteria. As a concrete application, we analyze linear regression and derive learning curves. This includes cases with real-world datasets, where exact replica calculations are not feasible.

Paper Structure

This paper contains 21 sections, 55 equations, 3 figures.

Figures (3)

  • Figure 1: Verification of variational Gaussian approximation (VGA) in linear regression. Both generalization and training errors are shown. (a)-(l): synthetic data. (m)-(p): real-world data.Markers represent true generalization and training errors evaluated by large data of size $n_0=10^4$. Lines are predictions of VGA.
  • Figure 2: The dependence of Learning curves for the YP MSD dataset on $n_0$. (a)-(h): generalization error $\bar{\epsilon}_{\mathrm{pred}}$. (i)-(p): training error $\bar{\epsilon}_{\rm tr}$. The average $\mathbb{E}_{z\sim p_{\rm{data}}}$ is approximated using a dataset of size $n_0$. Blue lines: VGA predictions. Red circles: experiments. Cyan dashed line: $n_0/d$.
  • Figure 3: Gradient fields of the gradients of the free energy ${\mathcal{F}} \equiv \lim_{\beta\to\infty, r\to0}{\mathcal{F}}_{r,{\rm GC}}^{\beta,n}$. (a): the steepest descent direction of ${\mathcal{F}}$. (b): the case where the gradient with respect to $q$ is inverted.