Table of Contents
Fetching ...

Local convexity of the TAP free energy and AMP convergence for Z2-synchronization

Michael Celentano, Zhou Fan, Song Mei

TL;DR

The paper proves that for Z2-synchronization at any signal strength $\lambda>1$, the TAP free energy ${\mathcal F}_{\mathrm{TAP}}$ has a Bayes-optimal local minimizer ${\boldsymbol m}_{\star}$ near the Bayes posterior mean, and that ${\mathcal F}_{\mathrm{TAP}}$ is strongly convex in a $\sqrt{\varepsilon n}$-neighborhood of ${\boldsymbol m}_{\star}$. It then shows that natural gradient descent (NGD) reliably converges linearly to ${\boldsymbol m}_{\star}$ from a local initialization, which itself can be obtained by a finite number of AMP iterations; moreover, the AMP map is locally stable at ${\boldsymbol m}_{\star}$, enabling finite-$n$ convergence guarantees. In the large-$\lambda$ regime, either AMP or NGD from a spectral initialization converges linearly to ${\boldsymbol m}_{\star}$, and the global TAP landscape aligns with a unique global minimizer in a broad region. The proofs combine Kac-Rice localization and Sudakov-Fernique Gaussian comparison to control critical points and local convexity, with state-evolution results informing the AMP dynamics. Overall, the work provides a rigorous, algorithmically tractable foundation for TAP-based variational inference in high dimensions and clarifies the landscape and convergence of AMP/NGD in Z2-synchronization.

Abstract

We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength $λ> 1$ (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a constant number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any $λ> 1$, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large $λ$. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit. Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.

Local convexity of the TAP free energy and AMP convergence for Z2-synchronization

TL;DR

The paper proves that for Z2-synchronization at any signal strength , the TAP free energy has a Bayes-optimal local minimizer near the Bayes posterior mean, and that is strongly convex in a -neighborhood of . It then shows that natural gradient descent (NGD) reliably converges linearly to from a local initialization, which itself can be obtained by a finite number of AMP iterations; moreover, the AMP map is locally stable at , enabling finite- convergence guarantees. In the large- regime, either AMP or NGD from a spectral initialization converges linearly to , and the global TAP landscape aligns with a unique global minimizer in a broad region. The proofs combine Kac-Rice localization and Sudakov-Fernique Gaussian comparison to control critical points and local convexity, with state-evolution results informing the AMP dynamics. Overall, the work provides a rigorous, algorithmically tractable foundation for TAP-based variational inference in high dimensions and clarifies the landscape and convergence of AMP/NGD in Z2-synchronization.

Abstract

We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a constant number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any , and is linearly convergent to this minimizer from a spectral initialization for sufficiently large . Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit. Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.

Paper Structure

This paper contains 45 sections, 31 theorems, 310 equations, 5 figures.

Key Result

Theorem 2.1

Fix any $\lambda>1$. There exist $\lambda$-dependent constants $\varepsilon, t > 0$ and $r \in (0, 1)$ such that for any fixed $\iota>0$, with probability approaching 1 as $n \to \infty$, the following all occur.

Figures (5)

  • Figure 1: Convergence of AMP and NGD from a spectral initialization. Left: Residual squared error $\min\{ \| {\boldsymbol m}^{k} - {\boldsymbol m}_\star \|_2^2 / n, \| {\boldsymbol m}^{k} + {\boldsymbol m}_\star \|_2^2 / n\}$ versus number of iterations $k$ (both on a log-scale), for signal-to-noise ratio $\lambda=1.5$. The mean curve is averaged over $10$ independent instances, and the error bars report $1/\sqrt{10}$ times the standard deviation across instances. Right: Success probability of NGD for convergence to ${\boldsymbol m}_\star$, for varying signal-to-noise ratios $\lambda$ and step sizes $\eta$. In both panels, $n = 500$.
  • Figure 2: Universality with respect to the noise distribution. Left: Estimation mean squared error $\min\{\| {\boldsymbol m}_\star - {\boldsymbol x} \|_2^2 / n, \| {\boldsymbol m}_\star - {\boldsymbol x} \|_2^2 / n \}$ versus the signal-to-noise ratio $\lambda$, for different noise ensembles. The mean curve is averaged over 10 independent instances, and the error bars report $1/\sqrt{10}$ times the standard deviation across instances. Right: Residual squared error $\min\{ \| {\boldsymbol m}^{k} - {\boldsymbol m}_\star \|_2^2 / n, \| {\boldsymbol m}^{k} + {\boldsymbol m}_\star \|_2^2 / n\}$ versus the number of iterations $k$, for different noise ensembles and signal-to-noise ratio $\lambda=1.5$. In both panels, $n=500$.
  • Figure 3: Comparison of TAP with mean-field VB. The plot shows mean squared errors of the TAP and VB minimizers in both a correctly specified and a misspecified model, for signal-to-noise ratio $\lambda \in [1, 2]$ and $n=500$. The mean curve is averaged over 10 independent instances, and the error bars report $1/\sqrt{10}$ times the standard deviation across instances.
  • Figure 4: The contour plot of the function $\bar{E}_\lambda(q,\varphi)$ as defined in Eq. (\ref{['eqn:bar_E_q_phi']}). Here we take $\lambda = 1.1, 1.2, 1.5$. The global minimum is at $(q, \varphi) = (q_\star(\lambda), q_\star(\lambda))$ where $q_\star(1.1) \approx 0.1917$, $q_\star(1.2) \approx 0.3577$, $q_\star(1.5) \approx 0.6923$. The dashed line is $q = \varphi$.
  • Figure 5: The scatter plot of eigenvalues of the linearized AMP operator ${\rm d} T_\mathsf{AMP}({\boldsymbol m}_\star,{\boldsymbol m}_\star)$. We choose $n = 500$ and $\lambda = 1.5$.

Theorems & Definitions (69)

  • Theorem 2.1: Local convexity and AMP stability
  • Corollary 2.2: Global landscape for large $\lambda$
  • Theorem 2.3: Computation of Bayes-optimal TAP minimizer
  • Theorem 2.4: Convergence of AMP and NGD for large $\lambda$
  • Remark 2.5
  • Remark 2.6
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Corollary 4.4
  • ...and 59 more