Table of Contents
Fetching ...

Exploring the Energy Landscape of RBMs: Reciprocal Space Insights into Bosons, Hierarchical Learning and Symmetry Breaking

J. Quetzalcóatl Toledo-Marin, Anindita Maiti, Geoffrey C. Fox, Roger G. Melko

TL;DR

The paper introduces a reciprocal-space formulation of Restricted Boltzmann Machines (RBMs) to illuminate connections between RBMs, diffusion processes, and systems of coupled Bosons. It shows that at initialization RBMs exhibit a saddle-point energy landscape with rotational symmetry in the singular-value spectrum governed by the Marčenko–Pastur distribution, and that training induces hierarchical learning that breaks this symmetry in a Landau-like fashion. In the infinite-size limit, reciprocal variables become Gaussian, leading to a decoupled or partially decoupled Bosonic picture and potential diffusion non-convergence for some modes. Through MNIST experiments with replicas of RBMs, the work demonstrates how singular-value structure and symmetry breaking relate to feature hierarchies and learning dynamics, offering a unifying perspective across generative model frameworks and suggesting new avenues for quantum-inspired and diffusion-based learning approaches.

Abstract

Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.

Exploring the Energy Landscape of RBMs: Reciprocal Space Insights into Bosons, Hierarchical Learning and Symmetry Breaking

TL;DR

The paper introduces a reciprocal-space formulation of Restricted Boltzmann Machines (RBMs) to illuminate connections between RBMs, diffusion processes, and systems of coupled Bosons. It shows that at initialization RBMs exhibit a saddle-point energy landscape with rotational symmetry in the singular-value spectrum governed by the Marčenko–Pastur distribution, and that training induces hierarchical learning that breaks this symmetry in a Landau-like fashion. In the infinite-size limit, reciprocal variables become Gaussian, leading to a decoupled or partially decoupled Bosonic picture and potential diffusion non-convergence for some modes. Through MNIST experiments with replicas of RBMs, the work demonstrates how singular-value structure and symmetry breaking relate to feature hierarchies and learning dynamics, offering a unifying perspective across generative model frameworks and suggesting new avenues for quantum-inspired and diffusion-based learning approaches.

Abstract

Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.

Paper Structure

This paper contains 12 sections, 55 equations, 8 figures.

Figures (8)

  • Figure 1: Left panel) Log-likelihood vs epochs for RBMs with hidden layer sizes $M=500,784,1200,3000$ . The partition function was estimated using annealed importance sampling (AIS) and reverse AIS salakhutdinov2008learningburda2015accurate. Each data point corresponds to the average over five replicas and the error bars correspond to the standard deviation. Right panel) Image of number $3$ generated from a trained RBM and, via the transformations in Eq. \ref{['eq:transformations']}, projected onto the energy landscapes (clockwise starting at the upper left corner) $1,2,3,4,20,30,50,100,200,300,498$ and $500$. The magenta star marks the saddle point whereas the black pentagon corresponds to the image projection to reciprocal space.
  • Figure 2: Each subpanel shows the $xy$-plane contour energy $E_i$ for the singular value index 1,2,3 and 498. The colored data points correspond to MNIST test data projected onto the reciprocal space. The gray hexagons correspond to Gibbs sampled data. The magenta star marks the saddle point. Left panel) Randomly initialized RBM. Right panel) Trained RBM.
  • Figure 3: Singular values probability density function for a random RBM with 784 visible nodes and with a) 500, b) 784, c) 1200 and d) 3000 hidden nodes. e) Kurtosis of the reciprocal variable $y$ for hidden layer sizes $M=500,700,784,1200,2000$ for randomly initialized RBMs, with $784$ visible nodes. Each data point was generated from averaging over the kurtosis of the hidden nodes with non-zero singular values. The kurtosis for each node was computed from projecting $7\cdot 10^6$ binary vector samples to reciprocal space. The ribbon corresponds to the standard deviation over the hidden layer. The dashed line marks the target for Gaussianity. The blue curve is a linear fit extrapolated to hidden layer size infinity where the reciprocal variable becomes Gaussian. The linear fit residue is on the order of $\mathcal{O}(10^{-4})$ and has been included. f) Distribution of singular values for trained RBM with different hidden layer sizes and different training methods.
  • Figure 4: Jensen divergence between rotated and non-rotated weight matrix for RBMs with $M=500,784,1200,3000$. Each point corresponds to an average over five replicas. The dashed line corresponds to a non-trained RBM with $M=500$. PDF of RBM weight matrix before and after random rotations for one of the replicas is shown.
  • Figure 5: Probability density function of the reciprocal variables for various energy landscapes for trained and non-trained RBM. The samples where generated from a multivariate $1/2$-Bernoulli distribution projected onto reciprocal space. The continuous curves correspond to a normal distribution centered at $\mu_x^{(i)}$ and $\mu_y^{(i)}$ and standard deviation $\sigma_x^{(i)}$ and $\sigma_y^{(i)}$ given by \ref{['eq:muAndSigma']}.
  • ...and 3 more figures