Table of Contents
Fetching ...

Dynamical mean-field analysis of adaptive Langevin diffusions: Replica-symmetric fixed point and empirical Bayes

Zhou Fan, Justin Ko, Bruno Loureiro, Yue M. Lu, Yandi Shen

TL;DR

This work develops a dynamical mean-field theory (DMFT) for high-dimensional adaptive Langevin sampling in Bayesian linear regression, where the prior adapts along the Langevin trajectory. Under approximate time-translation-invariance (TTI) and a uniform log-Sobolev inequality (LSI) for the posterior, the authors prove that the adaptive Langevin dynamics converge to an equilibrium described by scalar fixed-point equations that match replica-symmetric (RS) predictions for the mean-squared error and free energy. They also establish dimension-free convergence of the empirical Bayes prior parameter to RS critical points and provide a dynamical proof of RS optimality in misspecified-prior settings. The results connect the long-time behavior of adaptive diffusion to RS fixed points, offering rigorous insights into the learning dynamics of empirical Bayes priors in high dimensions with potential practical implications for scalable Bayesian inference. The analysis relies on careful DMFT constructions, approximate-TTI regimens, and coupling arguments to control auxiliary processes and quantify convergence in Wasserstein and KL divergences.

Abstract

In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Results of dynamical mean-field theory (DMFT) developed in our companion paper establish a precise high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. In this work, we carry out an analysis of the equations that describe this DMFT limit, under conditions of approximate time-translation-invariance which include, in particular, settings where the posterior law satisfies a log-Sobolev inequality. In such settings, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of scalar fixed-point equations, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. As a by-product of our analyses, we obtain a new dynamical proof that this replica-symmetric limit for the free energy is exact, in models having a possibly misspecified prior and where a log-Sobolev inequality holds for the posterior law.

Dynamical mean-field analysis of adaptive Langevin diffusions: Replica-symmetric fixed point and empirical Bayes

TL;DR

This work develops a dynamical mean-field theory (DMFT) for high-dimensional adaptive Langevin sampling in Bayesian linear regression, where the prior adapts along the Langevin trajectory. Under approximate time-translation-invariance (TTI) and a uniform log-Sobolev inequality (LSI) for the posterior, the authors prove that the adaptive Langevin dynamics converge to an equilibrium described by scalar fixed-point equations that match replica-symmetric (RS) predictions for the mean-squared error and free energy. They also establish dimension-free convergence of the empirical Bayes prior parameter to RS critical points and provide a dynamical proof of RS optimality in misspecified-prior settings. The results connect the long-time behavior of adaptive diffusion to RS fixed points, offering rigorous insights into the learning dynamics of empirical Bayes priors in high dimensions with potential practical implications for scalable Bayesian inference. The analysis relies on careful DMFT constructions, approximate-TTI regimens, and coupling arguments to control auxiliary processes and quantify convergence in Wasserstein and KL divergences.

Abstract

In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Results of dynamical mean-field theory (DMFT) developed in our companion paper establish a precise high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. In this work, we carry out an analysis of the equations that describe this DMFT limit, under conditions of approximate time-translation-invariance which include, in particular, settings where the posterior law satisfies a log-Sobolev inequality. In such settings, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of scalar fixed-point equations, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. As a by-product of our analyses, we obtain a new dynamical proof that this replica-symmetric limit for the free energy is exact, in models having a possibly misspecified prior and where a log-Sobolev inequality holds for the posterior law.

Paper Structure

This paper contains 35 sections, 35 theorems, 509 equations, 2 figures.

Key Result

theorem 1

Figures (2)

  • Figure 1: Simulations for the Gaussian mixture prior model $\frac{1}{2}\N(\alpha_1,1)+\frac{1}{2}\N(\alpha_2,0.25)$ of Example \ref{['ex:gaussianmeanmixture']}, with true mixture means $\alpha^*=(1,-1)$ and linear model noise variance $\sigma^2=\delta s^2$ for $s=0.5$. Empirical Bayes Langevin dynamics is run for a single instance $(\X,\y)$ with $\max(n,d)=5000$, initialization $\theta_j^0 \overset{iid}{\sim} \N(0,1)$, and an Euler-Maruyama discretization of the dynamics. (a--e) Landscape of the replica-symmetric free energy $F(\alpha)$ plotted (for visual clarity) as $\log(F(\alpha)-F(\alpha^*)+10^{-3})$, for $\delta \in \{4,2,1,0.5,0.25\}$. Two stable fixed points of $0=\nabla F(\alpha)$ are depicted in red, with star indicating the true parameter $\alpha^*=(-1,1)$ and circle indicating a second fixed point $\alpha^\dagger$ near $(1,-1)$. Sample paths $\{\widehat{\alpha}^t\}_{t \geq 0}$ from two different initial states $\widehat{\alpha}^0$ are shown in blue and green. (f) Mean-squared-error $\frac{1}{d}\|\btheta^t-\btheta^*\|_2^2$ across iterations for these same two initial states, at $\delta=1$. The predicted value for a posterior sample $\btheta \sim \sP_{g(\cdot,\alpha)}( \cdot \mid \X,\y)$ is $\frac{1}{d}\|\btheta-\btheta^*\|_2^2 \approx \mse(\alpha)+\mse_*(\alpha)$, depicted by dashed lines for $\alpha \in \{\alpha^\dagger,\alpha^*\}$.
  • Figure 2: Simulations for the Gaussian mixture prior model $p_1(\alpha)\N(0,0.04)+p_2(\alpha)\N(0,1)+p_3(\alpha)\N(0,25)$ of Example \ref{['ex:gaussianweightmixture']}, with true weights $p(\alpha^*)=(0.6,0.2,0.2)$ and linear model noise variance $\sigma^2=\delta s^2$ for $s=0.2$. Empirical Bayes Langevin dynamics are run for two initializations $\widehat{\alpha}^0$ with random $\theta_j^0 \overset{iid}{\sim} \N(0,1)$ (black and blue), and an initialization $\widehat{\alpha}^0$ near $\alpha^*$ with ground truth $\theta_j^0=\theta_j^*$ (green). The remaining setup is the same as in Figure \ref{['fig:meanmixture']}. (a--e) Landscape of the replica-symmetric free energy $F(\alpha)$ for $\delta \in \{4,2,1,0.75,0.5\}$, plotted as $\log(F(\alpha)-F(\alpha^*)+10^{-3})$ in the coordinates $p(\alpha)$ on the simplex. The unique critical point $p(\alpha^*)$ is depicted as the red star. Sample paths of $\{p(\widehat{\alpha}^t)\}_{t \geq 0}$ are shown in green, black, and blue. (f) Mean-squared-error $\frac{1}{d}\|\btheta^t-\btheta^*\|_2^2$ across iterations for these same three initial states, at $\delta=0.75$. The predicted value of $\mse(\alpha^*)+\mse_*(\alpha^*)$ for a posterior sample is depicted by the dashed line.

Theorems & Definitions (75)

  • theorem 1
  • definition 1
  • theorem 2
  • remark 1
  • proposition 1
  • theorem 3
  • corollary 1
  • lemma 1
  • theorem 4
  • proposition 2
  • ...and 65 more