Table of Contents
Fetching ...

On the role of non-linear latent features in bipartite generative neural networks

Tony Bonnaire, Giovanni Catania, Aurélien Decelle, Beatriz Seoane

TL;DR

This work analyzes how hidden-unit priors in Restricted Boltzmann Machines shape their phase diagrams and memory retrieval capabilities. Using replica theory and finite-size Monte Carlo simulations, it shows that binary hidden units severely limit recall in the high-load regime, while enriching the hidden prior with ternary or ReLU-like activations and incorporating local biases restores retrieval and can suppress spin-glass phases at finite temperature. The results demonstrate that hidden-unit design—not just visible units or learning dynamics—critically governs the expressive power and memory capabilities of bipartite generative networks, with implications for generation quality and data-driven modeling. Overall, the paper elucidates how architectural choices modulate higher-order interactions in RBMs and their connection to classical associative memory models like Hopfield networks.

Abstract

We investigate the phase diagram and memory retrieval capabilities of bipartite energy-based neural networks, namely Restricted Boltzmann Machines (RBMs), as a function of the prior distribution imposed on their hidden units - including binary, multi-state, and ReLU-like activations. Drawing connections to the Hopfield model and employing analytical tools from statistical physics of disordered systems, we explore how the architectural choices and activation functions shape the thermodynamic properties of these models. Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity, limiting their effectiveness as associative memories. To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors. These adjustments restore ordered retrieval phases and markedly improve recall performance, even at finite temperatures. Our theoretical findings, supported by finite-size Monte Carlo simulations, highlight the importance of hidden unit design in enhancing the expressive power of RBMs.

On the role of non-linear latent features in bipartite generative neural networks

TL;DR

This work analyzes how hidden-unit priors in Restricted Boltzmann Machines shape their phase diagrams and memory retrieval capabilities. Using replica theory and finite-size Monte Carlo simulations, it shows that binary hidden units severely limit recall in the high-load regime, while enriching the hidden prior with ternary or ReLU-like activations and incorporating local biases restores retrieval and can suppress spin-glass phases at finite temperature. The results demonstrate that hidden-unit design—not just visible units or learning dynamics—critically governs the expressive power and memory capabilities of bipartite generative networks, with implications for generation quality and data-driven modeling. Overall, the paper elucidates how architectural choices modulate higher-order interactions in RBMs and their connection to classical associative memory models like Hopfield networks.

Abstract

We investigate the phase diagram and memory retrieval capabilities of bipartite energy-based neural networks, namely Restricted Boltzmann Machines (RBMs), as a function of the prior distribution imposed on their hidden units - including binary, multi-state, and ReLU-like activations. Drawing connections to the Hopfield model and employing analytical tools from statistical physics of disordered systems, we explore how the architectural choices and activation functions shape the thermodynamic properties of these models. Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity, limiting their effectiveness as associative memories. To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors. These adjustments restore ordered retrieval phases and markedly improve recall performance, even at finite temperatures. Our theoretical findings, supported by finite-size Monte Carlo simulations, highlight the importance of hidden unit design in enhancing the expressive power of RBMs.

Paper Structure

This paper contains 27 sections, 75 equations, 5 figures.

Figures (5)

  • Figure 1: Phase diagram of the Hopfield model as a function of the temperature $T$ and the density of patterns $\alpha$. P stands for Paramagnetic, SG for Spin Glass, R for Recall, (m)R for Metastable-Recall. On top of the recall phase for pure states, we show in shades of green the regions where mixed states with $L$ patterns (for odd $L$ in the range $L \in \left\{ {3,5,7,9} \right\}$ from light to dark green) exists as stable local minima of the free energy.
  • Figure 2: Illustration of the early learning dynamics of a binary-binary RBM trained on the binarized MNIST dataset. (a) Columns of the weight matrix after 100 gradient updates, displaying a subset of 70 out of 100 hidden units plotted as $28\times 28$ images. (b) Samples generated by the same model as in (a), evaluated at different effective inverse temperatures $\beta$. For each temperature, the histogram shows the magnetization projected along the first principal direction $\bm{u}$ of the weight matrix; insets display representative samples generated by the model. (c) Same as (a), but after 1000 gradient updates. The weight matrix now encodes a greater diversity of patterns. (d) Samples generated by the model after 1000 updates, as in (c), showing increased digit variety but limited intra-class variability. (e) Histogram of overlaps between the generated samples (after 1000 updates) and the first three principal components $\alpha$ of the weight matrix, indicating dominant alignment along a few learned directions.
  • Figure 3: Left: Recall phase of the binary-binary RBM with an extensive number of patterns, as defined in Eq. \ref{['eq:Hamiltonian_BinaryBinary_HighLoad_RepeatdHidden']}. Colored lines indicate the phase boundary of the (metastable) recall phase in the $(\alpha, T / \sqrt{\alpha_\mathrm{H}})$ plane for various values of $\alpha_\mathrm{H}$. Diamond-shaped scatter points represent numerical estimates obtained via Monte Carlo simulations for a system of size $N_{\mathrm{v}} = 100$. Inset: Critical capacity computed in the limit $\alpha_\mathrm{H} \to \infty$, and plotted in the plane $\left( {\alpha, \beta} \right)$ . In this limit, the model recovers the Hopfield value $\alpha_\mathrm{c} \approx 0.14$ when $\beta \to 0$. Right: Phase diagram of the model with sparse hidden activations, shown for $\alpha_\mathrm{H} = 2$. As the bias $\theta$ increases, the model approaches the standard binary case; in contrast, decreasing $\theta$ silences most hidden nodes by favoring $\tau = 0$, effectively reducing their contribution. Interestingly, increasing sparsity (i.e., larger $\theta$) enhances the model’s capacity, as more selective hidden activation reduces interference between patterns. This trend is corroborated by numerical simulations (in colored diamonds).
  • Figure 4: Left: Phase diagram of the ReLU RBM as a function of temperature. The retrieval (R) and metastable-recall (m)R phases are significantly larger than in the Hopfield model, indicating enhanced memory capacity. A Spin glass phase is also present, yet the line is more difficult to identify analytically since that the overlap parameter $q$ is always non-zero due to the non-symmetric nature of the hidden variables. Center: Transition lines between the (m)R and spin-glass phases are shown for various values of the hidden bias $\eta$. As $\eta$ becomes more and more negative, the phase boundary shifts, enlarging the retrieval region. The light green curve corresponds to the case $\eta = 0$ (and thus to the boundary in the left panel). Right: Critical capacity of the model as a function of the hidden bias $\eta$, shown for the retrieval of $L$ memory states.
  • Figure 5: Left: Critical lines in the Binary-ReLU RBM with an external field, plotted in the plane $\left( {\eta,\alpha} \right)$. The blue line represents the spinodal of the Ferromagnetic solution, at the right of which no retrieval state is present. The red line represents the spinodal of the spin-glass solution, at the left of which no such fixed point exists. Finally, the green curve represent the (1st) order phase transition line between the two equilibrium states. Right: The overlap between a given pattern and the sampled configuration. In blue, doing a annealing from $\eta=-2$ to higher valuer, in orange an annealing starting from $\eta=-0.5$ toward higher values. We can clearly identify the two spinodal points. Here $N_{\mathrm{v}}=10000$, $\alpha=0.5$ and at each annealing steps we perform $t=10^4$ MC steps.