Table of Contents
Fetching ...

Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

Yizhou Xu, Florent Krzakala, Lenka Zdeborová

TL;DR

This paper analyzes Restricted Boltzmann Machines (RBMs) in the high-dimensional regime where $n,d\to\infty$, $n/d=\alpha=\Theta(1)$ and the number of hidden units $k$ remains fixed. It derives an exact reduction of the RBM likelihood to an effective unsupervised multi-index objective with a non-separable regularization, enabling rigorous AMP state evolution (SE) and dynamical mean-field theory (DMFT) analyses of training dynamics. By mapping data from the spiked covariance model to a teacher RBM, the authors prove that RBMs achieve the BBP weak recovery threshold and provide sharp, high-dimensional asymptotics for both AMP and gradient-descent training. The results establish a principled bridge between unsupervised RBM learning and high-dimensional inference techniques, offering precise predictions for optimization and dynamics and guiding future extensions to more complex generative architectures.

Abstract

The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBM on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition, in the spiked covariance model.

Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

TL;DR

This paper analyzes Restricted Boltzmann Machines (RBMs) in the high-dimensional regime where , and the number of hidden units remains fixed. It derives an exact reduction of the RBM likelihood to an effective unsupervised multi-index objective with a non-separable regularization, enabling rigorous AMP state evolution (SE) and dynamical mean-field theory (DMFT) analyses of training dynamics. By mapping data from the spiked covariance model to a teacher RBM, the authors prove that RBMs achieve the BBP weak recovery threshold and provide sharp, high-dimensional asymptotics for both AMP and gradient-descent training. The results establish a principled bridge between unsupervised RBM learning and high-dimensional inference techniques, offering precise predictions for optimization and dynamics and guiding future extensions to more complex generative architectures.

Abstract

The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBM on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition, in the spiked covariance model.

Paper Structure

This paper contains 31 sections, 14 theorems, 136 equations, 7 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumption assum:RBM, we have where is the effective log-likelihood function, with $\eta_1:\mathbb{R}^k\times\mathbb{R}\times\mathbb{R}^k\to\mathbb{R}$ defined as and $\eta_2:\mathbb{R}^{k\times k}\times\mathbb{R}^k\times\mathbb{R}\times\mathbb{R}^k\to\mathbb{R}$ defined as The limit in eq:loglikelihood holds for any sequence $\boldsymbol{X}(n)\in\mathbb{R}^{n\times d}$, and ${\boldsymbo

Figures (7)

  • Figure 1: Left: Iteration curves of AMP-RBM, $r=k=2,\Lambda=1.4I_2$, so the overlap is a $2\times2$ matrix containing $\zeta_{11},\zeta_{12},\zeta_{21},\zeta_{22}$. Lines denote the state evolution. Right: The performance of AMP-RBM, GD (over \ref{['eq:opt']}) and the Bayes optimality lesieur2017constrained for $r=k=2$ and $\Lambda=\text{diag}(\lambda,0.5\lambda)$, where AMP-RBM and GD use random intialization. The dashed blue line represents the BBP transition. The purple and yellow lines represent the SE of AMP-RBM and GD, which almost overlap. We use the Rademacher prior and $n=8000$, $d=4000$. Our implementation of all experiments is available at https://github.com/SPOC-group/RBM_asymptotics.
  • Figure 2: Iteration curves of GD, where the lines denote its asymptotics. The same setting as Figure \ref{['fig:rank1']}.
  • Figure 3: Reconstruction of occluded (Upper) and noisy (Lower) Fashion MNIST figures by RBMs trained with CD and GD. The RBMs have $10$ hidden units.
  • Figure 4: Iteration of CD. The setting is the same as Figure \ref{['fig:rank1']}.
  • Figure 5: Comparision between the standard RBM (trained with CD) and other methods for $r=k=2$ (Left), $r=k=10$ (Right), Rademacher prior. The red line represents the overlap of the Bayes optimal estimation. The purple lines represent the asymptotics of stationary points (Theorem \ref{['theo:main']}). The yellow lines represent the asymptotics of the fixed points of DMFT (Theorem \ref{['theo:GD']}), which overlap with the purple lines. Note that the red, yellow and purple lines are all for rank-1. The green lines represent the asymptotics of SVD. Dashed blue lines represent the BBP threshold.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 1
  • Theorem 5
  • proof
  • Corollary 2
  • Lemma 1
  • proof
  • ...and 11 more